1
1
Граф коммитов

9738 Коммитов

Автор SHA1 Сообщение Дата
Yossi Itigin
0522179efc pml/yalla: use opal_datatype_span() to get config type length.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-07-10 01:25:42 +03:00
Yossi Itigin
e94c6b16f0 pml/yalla: fix getting size of a continuous type.
pull request #3765 introduced a bug where the extent of a type is used
instead of its size.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-07-07 19:03:54 +03:00
Chris Ward
5de3d5dde6 Fix MPI_SIZEOF for gfortran 4.8
Add copyrights.

Revise the README to take out the 'most notably' statement about GNU Fortran 4.8

Signed-off-by: Chris Ward <tjcw@uk.ibm.com>
2017-07-07 13:47:35 +01:00
Gilles Gouaillardet
fc11c37223 Merge pull request #3646 from ggouaillardet/spacc-fix-coverity-warnings
coll/spacc: misc fixes
2017-07-06 11:39:14 +09:00
Mikhail Kurnosov
44acc92104 Fix buffer overflow
Add check for bounds of sindex[] and rindex[].

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2017-07-06 10:49:08 +09:00
Gilles Gouaillardet
5fceca235b coll/spacc: silence more coverity warnings in mca_coll_spacc_allreduce_intra_redscat_allgather()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-06 10:49:08 +09:00
Mikhail Kurnosov
2f0f476642 Silence spacc coverity warnings
1. Add assert for opal_hibit return value: comm_size is always > 1.
2. Modified verbose output (dead-code warning).

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2017-07-06 10:49:08 +09:00
Ralph Castain
31130a4bee Replace syntax with something less strictly C99
Fixes #3809

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-05 16:54:36 -07:00
Gilles Gouaillardet
d1c5955b73 coll/base: optimize handling of zero-byte datatypes in mca_coll_base_alltoallv_intra_basic_inplace()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-30 09:47:08 +09:00
Ralph Castain
bd4a6fee22 Attempt to detect when we are direct-launched without the necessary PMI support, and thus are incorrectly identified as being "singleton". Advise the user on the required PMI(x) support and error out.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-29 15:26:53 -07:00
Gilles Gouaillardet
7e5e5fe887 Merge pull request #3719 from ggouaillardet/topic/libnbc_revamp
coll/libnbc: revisit NBC_Handle usage
2017-06-29 11:13:58 +09:00
Nathan Hjelm
022c658bbf osc/rdma: rework locking code to improve behavior of unlock
This commit changes the locking code to allow the lock release to be
non-blocking. This helps with releasing the accumulate lock which may
occur in a BTL callback.

Fixes #3616

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-27 15:29:51 -06:00
George Bosilca
f8ffec926e
Protect the monitoring infrastructure initialization. 2017-06-27 18:35:24 +02:00
Clément FOYER
c885ee3f3c Fix Coverity warning CID 1413323 (#3764)
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
2017-06-27 12:39:31 +02:00
bosilca
d55b666834 Topic/monitoring (#3109)
Add a monitoring PML, OSC and IO. They track all data exchanges between processes,
with capability to include or exclude collective traffic. The monitoring infrastructure is
driven using MPI_T, and can be tuned of and on any time o any communicators/files/windows.
Documentations and examples have been added, as well as a shared library that can be
used with LD_PRELOAD and that allows the monitoring of any application.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>


* add ability to querry pml monitorinting results with MPI Tools interface
using performance variables "pml_monitoring_messages_count" and
"pml_monitoring_messages_size"

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Fix a convertion problem and add a comment about the lack of component
retain in the new component infrastructure.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Allow the pvar to be written by invoking the associated callback.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Various fixes for the monitoring.
Allocate all counting arrays in a single allocation
Don't delay the initialization (do it at the first add_proc as we
know the number of processes in MPI_COMM_WORLD)

Add a choice: with or without MPI_T (default).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Cleanup for the monitoring module.
Fixed few bugs, and reshape the operations to prepare for
global or communicator-based monitoring. Start integrating
support for MPI_T as well as MCA monitoring.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Adding documentation about how to use pml_monitoring component.

Document present the use with and without MPI_T.
May not reflect exactly how it works right now, but should reflects
how it should work in the end.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global variables in pml_monitoring.c.
Change mca_pml_monitoring_flush() signature so we don't need the size and rank parameters.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Improve monitoring support (including integration with MPI_T)

Use mca_pml_monitoring_enable to check status state. Set mca_pml_monitoring_current_filename iif parameter is set
Allow 3 modes for pml_monitoring_enable_output: - 1 : stdout; - 2 : stderr; - 3 : filename
Fix test : 1 for differenciated messages, >1 for not differenciated. Fix output.
Add documentation for pml_monitoring_enable_output parameter. Remove useless parameter in example
Set filename only if using mpi tools
Adding missing parameters for fprintf in monitoring_flush (for output in std's cases)
Fix expected output/results for example header
Fix exemple when using MPI_Tools : a null-pointer can't be passed directly. It needs to be a pointer to a null-pointer
Base whether to output or not on message count, in order to print something if only empty messages are exchanged
Add a new example on how to access performance variables from within the code
Allocate arrays regarding value returned by binding

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add overhead benchmark, with script to use data and create graphs out of the results
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix segfault error at end when not loading pml
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Start create common monitoring module. Factorise version numbering
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix microbenchmarks script
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Improve readability of code

NULL can't be passed as a PVAR parameter value. It must be a pointer to NULL or an empty string.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add osc monitoring component

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add error checking if running out of memory in osc_monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Resolve brutal segfault when double freeing filename
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Moving to ompi/mca/common the proper parts of the monitoring system
Using common functions instead of pml specific one. Removing pml ones.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add calls to record monitored data from osc. Use common function to translate ranks.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix test_overhead benchmark script distribution

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix linking library with mca/common

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add passive operations in monitoring_test

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix from rank calculation. Add more detailed error messages

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix alignments. Fix common_monitoring_get_world_rank function. Remove useless trailing new lines

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix osc_monitoring mget_message_count function call

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Change common_monitoring function names to respect the naming convention. Move to common_finalize the common parts of finalization. Add some comments.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add monitoring common output system

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add error message when trying to flush to a file, and open fails. Remove erroneous info message when flushing wereas the monitoring is already disabled.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Consistent output file name (with and without MPI_T).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Always output to a file when flushing at pvar_stop(flush).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Update the monitoring documentation.
Complete informations from HowTo. Fix a few mistake and typos.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use the world_rank for printf's.
Fix name generation for output files when using MPI_T. Minor changes in benchmarks starting script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean potential previous runs, but keep the results at the end in order to potentially reprocess the data. Add comments.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add security check for unique initialization for osc monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean the amout of symbols available outside mca/common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Remove use of __sync_* built-ins. Use opal_atomic_* instead.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Allocate the hashtable on common/monitoring component initialization. Define symbols to set the values for error/warning/info verbose output. Use opal_atomic instead of built-in function in osc/monitoring template initialization.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Deleting now useless file : moved to common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add histogram ditribution of message sizes

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add histogram array of 2-based log of message sizes. Use simple call to reset/allocate arrays in common_monitoring.c

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add informations in dumping file. Separate per category (pt2pt/osc/coll (to come)) monitored data

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add coll component for collectives communications monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix warning messages : use c_name as the magic id is not always defined. Moreover, there was a % missing. Add call to release underlying modules. Add debug info messages. Add warning which may lead to further analysis.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix log10_2 constant initialization. Fix index calculation for histogram array.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add debug info messages to follow more easily initialization steps.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Group all the var/pvar definitions to common_monitoring. Separate initial filename from the current on, to ease its lifetime management. Add verifications to ensure common is initialized once only. Move state variable management to common_monitoring.
monitoring_filter only indicates if filtering is activated.
Fix out of range access in histogram.
List is not used with the struct mca_monitoring_coll_data_t, so heritate only from opal_object_t.
Remove useless dead code.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix invalid memory allocation. Initialize initial_filename to empty string to avoid invalid read in mca_base_var_register.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Don't install the test scripts.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix missing procs in hashtable. Cache coll monitoring data.
    * Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to the PML layer.
    * Cache monitoring data relative to collectives operations on creation.
    * Remove double caching.
    * Use same proc name definition for hash table when inserting and
      when retrieving.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use intermediate variable to avoid invalid write while retrieving ranks in hashtable.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add missing release of the last element in flush_all. Add release of the hashtable in finalize.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use a linked list instead of a hashtable to keep tracks of communicator data. Add release of the structure at finalize time.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Set world_rank from hashtable only if found

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use predefined symbol from opal system to print int

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Move collective monitoring data to a hashtable. Add pvar to access the monitoring_coll_data. Move functions header to a private file only to be used in ompi/mca/common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error value. Fix releasing of coll_data_t objects. Affect value only if data is found in the hashtable.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add automated check (with MPI_Tools) of monitoring.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix procs list caching in common_monitoring_coll_data_t

    * Fix monitoring_coll_data type definition.
    * Use size(COMM_WORLD)-1 to determine max number of digits.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add linking to Fortran applications for LD_PRELOAD usage of monitoring_prof

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add PVAR's handles. Clean up code (visibility, add comments...). Start updating the documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix coll operations monitoring. Update check_monitoring accordingly to the added pvar. Fix monitoring array allocation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Documentation update.
Update and then move the latex and README documentation to a more logical place

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Aggregate monitoring COLL data to the generated matrix. Update documentation accordingly.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix monitoring_prof (bad variable.vector used, and wrong array in PMPI_Gather).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory footprint of monitoring_prof. Unify OSC related outputs.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add the use of a machine file for overhead benchmark

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Check for out-of-bound write in histogram

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix common_monitoring_cache object init for MPI_COMM_WORLD

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add RDMA benchmarks to test_overhead
Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add computation of average and median of overheads. Add comments and copyrigths to the test_overhead script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add technical documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Adapt to the new definition of communicators

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Update expected output in test/monitoring/monitoring_test.c

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add dumping histogram in edge case

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add consistency in header inclusion.
Include ompi/mpi/fortran/mpif-h/bindings.h only if needed.
Add sanity check before emptying hashtable.
Fix typos in documentation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* misc monitoring fixes

* test/monitoring: fix test when weak symbols are not available
* monitoring: fix a typo and add a missing file in Makefile.am
and have monitoring_common.h and monitoring_common_coll.h included in the distro
* test/monitoring: cleanup all tests and make distclean a happy panda
* test/monitoring: use gettimeofday() if clock_gettime() is unavailable
* monitoring: silence misc warnings (#3)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

* Cleanups.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Changing int64_t to size_t.
Keep the size_t used accross all monitoring components.
Adapt the documentation.
Remove useless MPI_Request and MPI_Status from monitoring_test.c.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add parameter for RMA test case

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean the maximum bound computation for proc list dump.
Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0dbe5.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add communicator-specific monitored collective data reset

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add monitoring scripts to the 'make dist'
Also install them in the build and the install directories.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-26 18:21:39 +02:00
Nathan Hjelm
31ab83362a osc/rdma: cleanup local peer setup and fix a bug
The data endpoint was not being set correctly for local peers in some
cases. This commit fixes the bug and cleans the associated code to
simplify the logic.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-22 13:28:45 -06:00
Christoph Niethammer
7f1347677d Create file for file backed shared memory in process job session dir.
Prevents file collisions and can also be cleaned by orte-clean properly.

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
2017-06-22 08:25:34 +02:00
Ralph Castain
814e858082 Merge pull request #3696 from rhc54/topic/pmix3
Update PMIx and integration glue
2017-06-20 10:38:29 -07:00
Ralph Castain
952726c121 Update to latest PMIx master - equivalent to 2.0rc2. Update the thread support in the opal/pmix framework to protect the framework-level structures.
This now passes the loop test, and so we believe it resolves the random hangs in finalize.

Changes in PMIx master that are included here:

* Fixed a bug in the PMIx_Get logic
* Fixed self-notification procedure
* Made pmix_output functions thread safe
* Fixed a number of thread safety issues
* Updated configury to use 'uname -n' when hostname is unavailable

Work on cleaning up the event handler thread safety problem
Rarely used functions, but protect them anyway
Fix the last part of the intercomm problem
Ensure we don't cover any PMIx calls with the framework-level lock.
Protect against NULL argv comm_spawn

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-20 09:02:15 -07:00
George Bosilca
1f291c8728
Add the fragment to the unexpected frags only after extracting the
pml_proc.
2017-06-20 16:03:52 +02:00
Gilles Gouaillardet
9ba85b85e1 coll/libnbc: revisit NBC_Handle usage
make NBC_Handle (almost) an internal structure created
by NBC_Schedule_request()
use a local variable instead of what was previously handle->tmpbuf

Refs open-mpi/ompi#3487

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-20 17:24:16 +09:00
Gilles Gouaillardet
68ac95003f coll/base: fix zero size datatype handling in mca_coll_base_alltoallv_intra_basic_inplace()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-20 14:36:35 +09:00
Nathan Hjelm
0c8c7e50d0 Merge pull request #3682 from hjelmn/comm_assertions
ompi: add support for new communicator info assertions
2017-06-19 09:49:59 -06:00
Edgar Gabriel
70107b3e52 Merge pull request #3703 from edgargabriel/pr/cart-comm-file-open-fix
Pr/cart comm file open fix
2017-06-15 15:03:38 -05:00
Edgar Gabriel
3b0b8fa12c io/ompio: update cartesian based grouping strategy
update the cartesian communicator based grouping strategy to match the other
algorithms used in the aggregator selection process.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-06-15 14:05:54 -05:00
Edgar Gabriel
bd6b430798 common/ompio: remove function call to cart_based_grouping
the cart_based_grouping aggregator strategy was not correctly updated
during the last major rewrite of the aggregator selection algorithm.
It is also not supposed to be called from file_open (but from
file_set_view).

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-06-15 14:04:03 -05:00
George Bosilca
e9d533e62e
Fix warnings from non-debug mode.
Thanks Ralph for the report.
2017-06-13 16:57:42 -04:00
Gilles Gouaillardet
72c7329462 configury: use 'uname -n' when 'hostname' is not available
the 'hostname' command might not be available on some platforms
such as Fedora Core 26, so mimick config/libtool.m4 and fallback
to 'uname -n' if needed

Refs. #3680

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-12 15:04:32 +09:00
KAWASHIMA Takahiro
b5b6b22848 Merge pull request #3678 from kawashima-fj/pr/signal-abort-delay
Apply `opal_abort_delay` to the OPAL signal handler
2017-06-12 10:35:11 +09:00
Joshua Hursey
80a91dc244 io/romio314: Add work around support for missing MPI_File ops
* Add work around support for the following missing ops in ROMIO 3.1.4
    - `MPI_File_iread_at_all`
    - `MPI_File_iwrite_at_all`
    - `MPI_File_iread_all`
    - `MPI_File_iwrite_all`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-06-09 14:42:59 -05:00
Joshua Hursey
29609631a2 mpi/c: Protect some IO functions not widely implemented
* Protects us from segv when ROMIO 314 is selected and one of the
   following operations is called:
   - MPI_File_iread_at_all
   - MPI_File_iwrite_at_all
   - MPI_File_iread_all
   - MPI_File_iwrite_all

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-06-09 11:42:26 -05:00
Nathan Hjelm
db2204f2f3 ompi: add support for new communicator info assertions
This commit adds code to allow support for the info assertions added
by mpi-forum/mpi-issues#11. The assertions added are:
mpi_assert_no_any_tag, mpi_assert_no_any_source,
mpi_assert_exact_length, and mpi_assert_allow_overtaking.

This commit also adds support for the mpi_assert_no_any_source and
mpi_assert_allow_overtaking info keys to the ob1 pml.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-08 15:52:12 -06:00
KAWASHIMA Takahiro
362445d486 Use same prefix format for [host:pid]
Hostname and PID are output as a message prefix in many places in
our code. Their printf-formats were either `[%s:%d]` or `[%s:%05d]`.
This commit changes `[%s:%d]` to `[%s:%05d]`. The latter was more
widely used in our code (including OPAL output system and the signal
handler).

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-06-08 19:35:03 +09:00
KAWASHIMA Takahiro
6b91eddc8b Apply opal_abort_delay to the signal handler
This commit expands the effect of the MCA parameter `opal_abort_delay`
to the OPAL signal handler. This allows attaching of a debugger on
segmentation fault etc. before quitting the job.

The sleep code is moved to the `opal_delay_abort` function from the
`ompi_mpi_abort` and `oshmem_shmem_abort` functions for code cleanup.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-06-08 19:34:48 +09:00
Mark Allen
aeb2c02d2f Type_create_darray with mix of BLOCK/CYCLIC
Example (using MPI_ORDER_C so the below has 6 rows of 4 ints to parcel out)
    size = 4;
    rank = 0;
    ndims=2;
    gsizes[0] = 6;
    gsizes[1] = 4;
    distribs[0] = MPI_DISTRIBUTE_CYCLIC;
    distribs[1] = MPI_DISTRIBUTE_BLOCK;
    dargs[0] = 2;
    dargs[1] = 2;
    psizes[0] = 2;
    psizes[1] = 2;
    MPI_Type_create_darray(size, rank, ndims,
        gsizes, distribs, dargs, psizes,
        MPI_ORDER_C, MPI_INT, &mydt);

Expectation for the layout:
   inner dimension (1) is
       4 items (ints) distributed block over 2 ranks with 2 items each
       eg for rank 0: [ x x . . ]
   outer dimension (0) is:
       6 items (the above [ x x . .]) cyclic over 2 ranks with 2 items each
       eg for rank 0:
           [ x x . . ]    :  offset=0 bytes=8
           [ x x . . ]    :  ofset=16 bytes=8
           [ . . . . ]
           [ . . . . ]
           [ x x . . ]    :  offset=64 bytes=8
           [ x x . . ]    :  offset=80 bytes=8

Or more specifically a stream of ints 0,1,2,3,4,5,6,7 sent into that
type should be
    [ 0 1 . . ]
    [ 2 3 . . ]
    [ . . . . ]
    [ . . . . ]
    [ 4 5 . . ]
    [ 6 7 . . ]
The data was laying out though as
    [ 0 1 2 3 ]
    [ . . . . ]
    [ . . . . ]
    [ . . . . ]
    [ 4 5 6 7 ]
    [ . . . . ]
because the recursive construction inside the block() function (which
creates the smaller row datatype [ x x . . ]) wasn't setting the extent
of that type.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-06-07 16:53:03 -04:00
Jeff Squyres
44aef39b24 Merge pull request #3641 from ggouaillardet/topic/fortran_strings
fortran/base: rename strings.h into fortran_base_strings.h
2017-06-05 15:31:08 -04:00
Ralph Castain
dea9ef2020 Merge pull request #3637 from hjelmn/osc_sm_info_fix
osc/sm: fix SEGV in new info usage
2017-06-05 05:45:21 -07:00
KAWASHIMA Takahiro
0cbdbe32f7 ompi/request: Support non-PML persistent requests
This commit adds the `req_start` member to the `ompi_request_t` struct.
The `MPI_START` and `MPI_STARTALL` routines call this callback function
instead of `MCA_PML_CALL(start(...))`. So components that return
persistent request must set this member to their request objects.

`mca_pml_base_module_t::pml_start` is not deleted because
`MCA_PML_CALL(start(...))` is still used elsewhere across OMPI.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-06-02 13:08:17 +09:00
KAWASHIMA Takahiro
c8d38d31c6 Merge pull request #3618 from kawashima-fj/pr/java-doc-man
java: Detect `javadoc` path and improve `mpijavac` man page
2017-06-02 10:24:05 +09:00
Gilles Gouaillardet
08526e8adc fortran/base: rename strings.h into fortran_base_strings.h
rename ompi/mpi/fortran/base/strings.h so it does not get pulled
when /usr/include/strings.h is expected.

Refs open-mpi/ompi#3639

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-02 09:46:20 +09:00
Josh Hursey
1665d771a6 Merge pull request #3635 from wlepera/fix/ibm/155305
MPI_Sendreceive_replace data error with > 2k msg
2017-06-01 14:38:01 -05:00
Jeff Squyres
d520c24f3a predefined MPI object padding: set to fixed number of bytes (#3634)
Convert the predefined MPI object padding to a fixed number of bytes
(vs. a multiple of sizeof(void*)) so that the padding is the same size
between 32 and 64 bit builds.  I.e., we won't have a situation where
we've run out of padding in 32 bit builds but still have more space
available in 64 bit builds.

Fixes #3610

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-06-01 15:28:23 -04:00
Nathan Hjelm
d10e6455a0 osc/sm: fix SEGV in new info usage
This commit moves the info subscribe for the blocking_fence to after
the global_state is allocated and moves setting win->w_osc_module to
before the info subscribe for alloc_shared_contig. This fixes a SEGV
caught by MTT.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-01 12:32:30 -06:00
William LePera
a7c9c4aef3 MPI_Sendreceive_replace data error with > 2k msg (RTC 155305)
Signed-off-by: William LePera <lepera@us.ibm.com>
2017-06-01 13:08:58 -04:00
Gilles Gouaillardet
5e9be7667b Merge pull request #3600 from ggouaillardet/topic/osc_rdma_get_segment
osc/rdma: fix osc_rdma_get_remote_segment() length parameter
2017-06-01 13:09:14 +09:00
Nathan Hjelm
e1a997c0cb Merge pull request #3593 from hjelmn/bug_3575
osc/rdma: fix typo in ompi_osc_rdma_lock_acquire_exclusive
2017-05-31 08:54:40 -06:00
KAWASHIMA Takahiro
76b1f80664 java: Use correct date/version in mpijava man page
`mpijavac.1` should be generated at `make`-time...

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-05-31 17:24:49 +09:00
KAWASHIMA Takahiro
63f0945dcc java: Detect the path of javadoc in configure
Without this change, the directory of `javadoc` command must be
included in the `PATH` environment variable at `make`-time.
Paths of `javac`, `javah`, and `jar` commands are detected in
`configure`. So the path of `javadoc` also should be detected.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-05-31 14:26:14 +09:00
Ralph Castain
ed4078e2dd Protect against the condition where the port string is actually NULL
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-28 20:51:09 -07:00
Gilles Gouaillardet
e622ca8c1c osc/rdma: fix osc_rdma_get_remote_segment() length parameter
a buffer defined by (buf, count, dt)
will have data starting at buf+offset and ending len bytes later with
len = opal_datatype_span(&dt.super, count, &offset);

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-29 11:08:03 +09:00
Ralph Castain
9f60cd0fe7 Update the connect/accept support so we check to see if we have the proper infrastructure and RTE support, including whether we have ompi-server available if the connect/accept spans multiple applications. Print pretty help messages in all cases where we do not have support
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-27 10:47:08 -07:00
Nathan Hjelm
b83c5dbee5 osc/rdma: fix typo in ompi_osc_rdma_lock_acquire_exclusive
Fixes #3575

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-05-26 14:21:08 -06:00
Nathan Hjelm
c7e6294f31 Merge pull request #3589 from hjelmn/cxx_glue
mpi/cxx: remove nonexistent function from cxx glue
2017-05-26 11:24:05 -06:00
Nathan Hjelm
ee9093c373 mpi/cxx: remove nonexistent function from cxx glue
This commit removes a nonexistent function that was causing build
problems under certain environments.

Reference #3442

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-05-26 10:40:19 -06:00
Josh Hursey
4bfb0fcddd Merge pull request #3577 from markalle/pr/osc_rdma_rangecheck
fix for buffer length check (rdma osc w/ odd datatypes)
2017-05-26 10:44:33 -05:00
Nathan Hjelm
7d5cc8ebca Merge pull request #3572 from ggouaillardet/topic/ompi_osc_rdma_rget_accumulate_internal
osc/rdma: fix datatype extent usage in ompi_osc_rdma_rget_accumulate_…
2017-05-26 09:37:51 -06:00
Gilles Gouaillardet
47ebfaa60d Merge pull request #3451 from mkurnosov/reduce-allreduce-rebenseifner
coll: Add Rabenseifner's algorithm for Reduce and Allreduce
2017-05-26 21:00:30 +09:00
Mikhail Kurnosov
f6e2d4ab04 coll: Add Rabenseifner's algorithm for Reduce and Allreduce
A component with implementation of R. Rabenseifner's algorithm for Reduce and Allreduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather or an allgather.

Current limitations:
  -- count >= 2^{\floor{\log_2 p}}
  -- commutative operations only
  -- intra-communicators onl

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>

coll/spacc: Modify implementation to use `ompi_coll_base_sendrecv()`

Replace irecv() + isend() + ompi_request_wait() to ompi_coll_base_sendrecv().

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2017-05-26 14:33:35 +07:00
Gilles Gouaillardet
0f79259b94 osc/rdma: use extent of the appropriate datatype in ompi_osc_rdma_rget_accumulate_internal()
origin_datatype and target_datatype might be different and hence have different extent,
so use either origin_extent or target_extent when appropriate.

Refs open-mpi/ompi#3569

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-26 13:59:38 +09:00
KAWASHIMA Takahiro
c3bbd7dfec man: Remove unnecessary empty lines
All other man pages don't have an empty line after
the "! or the older form: INCLUDE 'mpif.h'" line

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-05-26 11:17:59 +09:00
KAWASHIMA Takahiro
e57ab611cd man: Fix roff markup of variable names
These typos are found by running `grep -r '\\f[^IBRP]' ompi/mpi/man/`.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-05-26 11:17:23 +09:00
Geoff Paulsen
93078ad824 Merge pull request #3551 from markalle/1sided_some_single
1sided with some hosts single rank -- Fixes #3548
2017-05-25 13:59:48 -05:00
Josh Hursey
51b2c42d18 Merge pull request #3574 from jjhursey/fix/type_create_f90-dt
ompi/mpi: Fixes for `mpi_type_create_f90_(real|complex)`
2017-05-25 08:58:33 -05:00
Mark Allen
df14cbf039 fix for buffer length check (rdma osc w/ odd datatypes)
The osc_rdma_get_remote_segment() has the 3rd and 4th args as
* target_disp
* length
which it uses to determine if the rdma falls within the bounds of
the window or not (actually it only checks the upper bound, but I'm
okay with that).

Anyway the caller previously was passing in the length argument as
    target_datatype->super.size * target_count
which which doesn't really represent the number of bytes after target_disp
for which data exists. In particular I could create a datatype as
    { disp -4, len 4 } and use target_disp 4
and that would be bytes 0-3 of the window where the original code
would think it was bytes 4-7 and could abort at the range check.

Ive changed it to use the opal_datatype_span() function.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-24 19:10:39 -04:00
Joshua Hursey
a5e9c3501b ompi/mpi: Fix MPI_UNDEFINED handling in mpi_type_create_f90_(real|complex)
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-05-24 12:10:49 -04:00
Mark Allen
36f51bca26 yalla with irregular contig datatype -- Fixes 3566
Yalla has a macro PML_YALLA_INIT_MXM_REQ_DATA that checks if a datatype
is contiguous via opal_datatype_is_contiguous_memory_layout(dt,count)
and if so it selects a size and lb that presumably is what will rdma, as
            ompi_datatype_type_size(_dtype, &size); \
            ompi_datatype_type_lb(_dtype, &lb); \

This failed when I gave it a datatype constructed as [ ...] with extent 4.
What I mean by that datatype is
    lens[0] = 3;
    disps[0] = 1;
    types[0] = MPI_CHAR;
    MPI_Type_struct(1, lens, disps, types, &tmpdt);
    MPI_Type_create_resized(tmpdt, 0, 4, &mydt);
So there are 3 chars at offset 1, and the LB is 0 and the UB is 4.

So that macro decides that size=4 and lb=0 and later I suppose size is getting
updated to 3 for the final rdma, and so a send of a buffer
[ 0 1 2 3 ] gets recved as [ 0 1 2 _ ]. I think it should use the true lb
and the true extent.

For "regular" contig datatypes it would be the same, and for the irregular
ones that are still deemed contiguous by that utility function it should
still be the right thing to use.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-23 20:56:12 -04:00
Joshua Hursey
5e302f5279 ompi/mpi: Fix parameter order in mpi_type_create_f90_(real|complex)
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-05-23 17:54:33 -04:00
Mark Allen
c9f31a8d39 fix for 1sided with some hosts single rank
See bug report
    https://github.com/open-mpi/ompi/issues/3548

If a 1sided test is launched -host hostA:2,hostB:1 some of the ranks
call allocate_state_single() and others call allocate_state_shared().
These functions were producing different values for module->state_size
but that's used when they lookup peer info from each other in
ompi_osc_rdma_peer_setup() so they need to all have matching
module->state_offset values.

This change adds a few unused bytes in the memory allocate_state_single()
creates so it matches.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-22 15:10:49 -04:00
Geoff Paulsen
50f9287c03 Merge pull request #2941 from markalle/pr/mpi-info-update2
Finally Merging this in.  MPI_*_get_info/set_info().
Targeting v3.1 release.  @hjelmn were you interested in switching some internal pieces to begin using this?  Should we target v3.1 (or whatever we call the Oct 15th release?)
2017-05-22 09:22:04 -05:00
Ryan Grant
b59eb76fcf Merge pull request #3528 from tkordenbrock/topic/mtl-portals4.mtl.rndv.get.race
mtl-portals4: in rendezvous, reissue PtlGet() if it fails
2017-05-17 12:57:45 -06:00
Mark Allen
482d84b6e5 fixes for Dave's get/set info code
The expected sequence of events for processing info during object creation
is that if there's an incoming info arg, it is opal_info_dup()ed into the obj
at obj->s_info first. Then interested components register callbacks for
keys they want to know about using opal_infosubscribe_infosubscribe().

Inside info_subscribe_subscribe() the specified callback() is called with
whatever matching k/v is in the object's info, or with the default. The
return string from the callback goes into the new k/v stored in info, and
the input k/v is saved as __IN_<key>/<val>. It's saved the same way
whether the input came from info or whether it was a default. A null return
from the callback indicates an ignored key/val, and no k/v is stored for
it, but an __IN_<key>/<val> is still kept so we still have access to the
original.

At MPI_*_set_info() time, opal_infosubscribe_change_info() is used. That
function calls the registered callbacks for each item in the provided info.
If the callback returns non-null, the info is updated with that k/v, or if
the callback returns null, that key is deleted from info. An __IN_<key>/<val>
is saved either way, and overwrites any previously saved value.

When MPI_*_get_info() is called, opal_info_dup_mpistandard() is used, which
allows relatively easy changes in interpretation of the standard, by looking
at both the <key>/<val> and __IN_<key>/<val> in info. Right now it does
  1. includes system extras, eg k/v defaults not expliclty set by the user
  2. omits ignored keys
  3. shows input values, not callback modifications, eg not the internal values

Currently the callbacks are doing things like
    return some_condition ? "true" : "false"
that is, returning static strings that are not to be freed. If the return
strings start becoming more dynamic in the future I don't see how unallocated
strings could support that, so I'd propose a change for the future that
the callback()s registered with info_subscribe_subscribe() do a strdup on
their return, and we change the callers of callback() to free the strings
it returns (there are only two callers).

Rough outline of the smaller changes spread over the less central files:
  comm.c
    initialize comm->super.s_info to NULL
    copy into comm->super.s_info in comm creation calls that provide info
    OBJ_RELEASE comm->super.s_info at free time
  comm_init.c
    initialize comm->super.s_info to NULL
  file.c
    copy into file->super.s_info if file creation provides info
    OBJ_RELEASE file->super.s_info at free time
  win.c
    copy into win->super.s_info if win creation provides info
    OBJ_RELEASE win->super.s_info at free time

  comm_get_info.c
  file_get_info.c
  win_get_info.c
    change_info() if there's no info attached (shouldn't happen if callbacks
      are registered)
    copy the info for the user

The other category of change is generally addressing compiler warnings where
ompi_info_t and opal_info_t were being used a little too interchangably. An
ompi_info_t* contains an opal_info_t*, at &(ompi_info->super)

Also this commit updates the copyrights.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-17 01:12:49 -04:00
Gilles Gouaillardet
384387bb53 Merge pull request #3411 from ggouaillardet/topic/mpi_f08_interfaces_callbacks
f08: make procedure(MPI_User_function) type available from mpi_f08
2017-05-17 09:02:26 +09:00
Jeff Squyres
39fa1d5c05 Merge pull request #3500 from bosilca/topic/any_source
Allow MPI_ANY_SOURCE in MPI_Sendrecv_replace.
2017-05-16 16:36:00 -04:00
Todd Kordenbrock
27ee862964 mtl-portals4: in rendezvous, reissue PtlGet() if it fails
This commit fixes a race condition in the rendezvous protocol.  The
race occurs because the sender does not wait for the link event on the
send buffer.  Even though this has not been seen in the wild, it is
possible for the receiver to issue the PtlGet() before the ME is
linked which causes a NAK at the receiver.  This commit resolves this
race by reissuing the PtlGet() when a NAK occurs.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-05-15 13:11:13 -05:00
David Solt
50aa143ab6 Major structural changes to data types: .super infosubscriber
ompi_communicator_t, ompi_win_t, ompi_file_t all have a super class of type opal_infosubscriber_t instead of a base/super type of opal_object_t (in previous code comm used c_base, but file used super).  It may be a bit bold to say that being a subscriber of MPI_Info is the foundational piece that ties these three things together, but if you object, then I would prefer to turn infosubscriber into a more general name that encompasses other common features rather than create a different super class.  The key here is that we want to be able to pass comm, win and file objects as if they were opal_infosubscriber_t, so that one routine can heandle all 3 types of objects being passed to it.

MPI_INFO_NULL is still an ompi_predefined_info_t type since an MPI_Info is part of ompi but the internal details of the underlying information concept is part of opal.

An ompi_info_t type still exists for exposure to the user, but it is simply a wrapper for the opal object.

Routines such as ompi_info_dup, etc have all been moved to opal_info_dup and related to the opal directory.

Fortran to C translation tables are only used for MPI_Info that is exposed to the application and are therefore part of the ompi_info_t and not the opal_info_t

The data structure changes are primarily in the following files:

    communicator/communicator.h
    ompi/info/info.h
    ompi/win/win.h
    ompi/file/file.h

The following new files were created:

    opal/util/info.h
    opal/util/info.c
    opal/util/info_subscriber.h
    opal/util/info_subscriber.c

This infosubscriber concept is that communicators, files and windows can have subscribers that subscribe to any changes in the info associated with the comm/file/window.  When xxx_set_info is called, the new info is presented to each subscriber who can modify the info in any way they want.  The new value is presented to the next subscriber and so on until all subscribers have had a chance to modify the value.  Therefore, the order of subscribers can make a difference but we hope that there is generally only one subscriber that cares or modifies any given key/value pair.  The final info is then stored and returned by a call to xxx_get_info.

The new model can be seen in the following files:

    ompi/mpi/c/comm_get_info.c
    ompi/mpi/c/comm_set_info.c
    ompi/mpi/c/file_get_info.c
    ompi/mpi/c/file_set_info.c
    ompi/mpi/c/win_get_info.c
    ompi/mpi/c/win_set_info.c

The current subscribers where changed as follows:

    mca/io/ompio/io_ompio_file_open.c
    mca/io/ompio/io_ompio_module.c
    mca/osc/rmda/osc_rdma_component.c (This one actually subscribes to "no_locks")
    mca/osc/sm/osc_sm_component.c (This one actually subscribes to "blocking_fence" and "alloc_shared_contig")

Signed-off-by: Mark Allen <markalle@us.ibm.com>

Conflicts:
	AUTHORS
	ompi/communicator/comm.c
	ompi/debuggers/ompi_mpihandles_dll.c
	ompi/file/file.c
	ompi/file/file.h
	ompi/info/info.c
	ompi/mca/io/ompio/io_ompio.h
	ompi/mca/io/ompio/io_ompio_file_open.c
	ompi/mca/io/ompio/io_ompio_file_set_view.c
	ompi/mca/osc/pt2pt/osc_pt2pt.h
	ompi/mca/sharedfp/addproc/sharedfp_addproc.h
	ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c
	ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c
	ompi/mpi/c/lookup_name.c
	ompi/mpi/c/publish_name.c
	ompi/mpi/c/unpublish_name.c
	opal/mca/mpool/base/mpool_base_alloc.c
	opal/util/Makefile.am
2017-05-12 14:41:05 -04:00
KAWASHIMA Takahiro
0650d4141f Merge pull request #3401 from kawashima-fj/pr/fortran-argv-null
fortran: Fix `MPI_ARGV(S)_NULL` compilation error
2017-05-11 11:23:12 +09:00
KAWASHIMA Takahiro
854fa5fc55 Merge pull request #3489 from kawashima-fj/pr/group-remote-peers-2nd
group: Fix `ompi_group_have_remote_peers` (2nd try)
2017-05-11 11:22:15 +09:00
Matias A Cabral
644641d06f PSM and PSM2 MTLs check on the max message size allowed by API.
OMPI send and receive mesages use size_t for the lenght while PSM and PSM2
psm(2)mq_send/receive use uint32_t. Type size_t is 64 bits in 64 bits arch.
Therefore, this patch adds a sanity check on the lenght of the message
and fails gracefully.

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2017-05-10 12:45:11 -07:00
George Bosilca
86a7b317a5
Allow MPI_ANY_SOURCE in MPI_Sendrecv_replace.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-05-09 16:57:15 -04:00
bosilca
cbf03b3113 Topic/datatype (#3441)
* Don't overflow the internal datatype count.
Change the type of the count to be a size_t (it does not alter the total
size of the internal structures, so has no impact on the ABI).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Optimize the datatype creation.
The internal array of counts of predefined types is now only created
when needed, which is either in a heterogeneous environment, or when
one call get_elements. It saves space and makes the convertor creation a
little faster in some cases.

Rearrange the fields in the datatype description structs.

The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the
static array was only partially created. All predefined types should
have the ptypes array created and initialized.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Fix the boundary computation.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* test/datatype: add test for short unpack on heteregeneous cluster

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Trying to reduce the cost of creating a convertor.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Respect the unpack boundaries.
As Gilles suggested on #2535 the opal_unpack_general_function was
unpacking based on the requested count and not on the amount of packed
data provided.
Fixes #2535.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-05-09 09:31:40 -04:00
Gilles Gouaillardet
a66909b8b4 Merge pull request #3488 from ggouaillardet/topic/romio314_ad_nfs
romio314: ad_nfs fixes for large files from upstream mpich
2017-05-09 16:58:02 +09:00
Gilles Gouaillardet
26f44da429 coll/base: fix mca_coll_base_alltoallv_intra_basic_inplace()
correctly handle the case when a MPI task has no data to send/recv

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 15:19:14 +09:00
Gilles Gouaillardet
eaf050cfe1 romio314: adio/ad_nfs: fix buffer overflows in ADIOI_NFS_{Read,Write}Strided
Refs: models/mpich#2338
Refs: models/mpich#2617

Signed-off-by: Rob Latham <robl@mcs.anl.gov>

(back-ported from upstream commit pmodels/mpich@642db57648)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 11:11:12 +09:00
Gilles Gouaillardet
02af10ce6e romio314: update NFS read/write routines for large xfers
When we updated UFS and others we left NFS alone.  HDF group would like
a fix, so here we go.

Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>

(back-ported from upstream commit pmodels/mpich@684df9f4c9)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 11:07:47 +09:00
Jeff Squyres
7185567d50 Merge pull request #3455 from jsquyres/pr/fix-lustre-configure
Lustre configure fixes
2017-05-08 16:49:23 -04:00
Ralph Castain
ef0e0171c9 Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c).
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-08 10:04:50 -07:00
KAWASHIMA Takahiro
e453e42279 group: Fix ompi_group_have_remote_peers
`ompi_group_t::grp_proc_pointers[i]` may have sentinel values even
for processes which reside in the local node because the array for
`MPI_COMM_WORLD` is set up before `ompi_proc_complete_init`, which
allocates `ompi_proc_t` objects for processes reside in the local
node, is called in `MPI_INIT`. So using `ompi_proc_is_sentinel`
against `ompi_group_t::grp_proc_pointers[i]` in order to determine
whether the process resides in a remote node is not appropriate.

This bug sometimes causes an `MPI_ERR_RMA_SHARED` error when
`MPI_WIN_ALLOCATE_SHARED` is called, where sm OSC uses
`ompi_group_have_remote_peers`.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-05-08 20:28:51 +09:00
KAWASHIMA Takahiro
913adce59b Revert "group: Fix ompi_group_have_remote_peers" 2017-05-08 18:42:18 +09:00
Artem Polyakov
858d8cdff7 Merge pull request #3375 from artpol84/comm_create/master
ompi/comm: Improve MPI_Comm_create algorithm
2017-05-05 20:41:16 -07:00
Jeff Squyres
c81bc50198 fs/lustre: remove redundant/dead code
We check for liblustreapi.h in OMPI_CHECK_LUSTRE, so this code was
commented out here.  Might as well fully delete it, since it's
redundant and dead.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-05-05 05:28:33 -07:00
Nathan Hjelm
4676575343 Merge pull request #3410 from kawashima-fj/pr/group-remote-peers
group: Fix `ompi_group_have_remote_peers`
2017-05-04 09:20:35 -06:00
KAWASHIMA Takahiro
28281190eb Merge pull request #3402 from kawashima-fj/pr/java
mpi/java: Add missing Java binding methods
2017-04-27 15:45:49 +09:00
Yossi
f56847542e Merge pull request #3347 from alinask/topic/ucx-sync-send
PML UCX: handle a synchronous send.
2017-04-26 18:02:09 +03:00
Alina Sklarevich
49913c692a PML UCX: unite the code for all the sending modes.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-26 13:17:06 +03:00
Gilles Gouaillardet
96b00b0fcf f08: make procedure(MPI_User_function) type available from mpi_f08
Refs. open-mpi/ompi#3409

Thanks Nathan T. Weeks for the report

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-25 13:33:00 +09:00
KAWASHIMA Takahiro
f036bac4c2 group: Fix ompi_group_have_remote_peers
`ompi_group_t::grp_proc_pointers[i]` may have sentinel values even
for processes which reside in the local node because the array for
`MPI_COMM_WORLD` is set up before `ompi_proc_complete_init`, which
allocates `ompi_proc_t` objects for processes reside in the local
node, is called in `MPI_INIT`. So using `ompi_proc_is_sentinel`
against `ompi_group_t::grp_proc_pointers[i]` in order to determine
whether the process resides in a remote node is not appropriate.

This bug sometimes causes an `MPI_ERR_RMA_SHARED` error when
`MPI_WIN_ALLOCATE_SHARED` is called, where sm OSC uses
`ompi_group_have_remote_peers`.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-25 11:00:52 +09:00
Jeff Squyres
7ea05954bf Merge pull request #3399 from jsquyres/pr/add-aint-add-diff
mpif-externals.h: add missing MPI_AINT_ADD/MPI_AINT_DIFF
2017-04-24 15:47:43 -04:00
KAWASHIMA Takahiro
3699ce1f75 mpi/java: Set the given error handler to Win
Probably setting `MPI_ERRORS_RETURN` is unintentional. Probably...

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-24 16:55:13 +09:00
KAWASHIMA Takahiro
8558185c85 mpi/java: Add missing Java binding methods
This commit add the following methods.

| Language-indep. notation | Java binding            |
| ------------------------ | ----------------------- |
| MPI_WIN_GET_ERRHANDLER   | mpi.Win.getErrhandler   |
| MPI_FILE_SET_ERRHANDLER  | mpi.File.setErrhandler  |
| MPI_FILE_GET_ERRHANDLER  | mpi.File.getErrhandler  |
| MPI_COMM_CALL_ERRHANDLER | mpi.Comm.callErrhandler |
| MPI_FILE_CALL_ERRHANDLER | mpi.File.callErrhandler |
| MPI_FILE_IREAD_AT_ALL    | mpi.File.iReadAtAll     |
| MPI_FILE_IWRITE_AT_ALL   | mpi.File.iWriteAtAll    |
| MPI_FILE_IREAD_ALL       | mpi.File.iReadAll       |
| MPI_FILE_IWRITE_ALL      | mpi.File.iWriteAll      |
| MPI_FILE_GET_ATOMICITY   | mpi.File.getAtomicity   |

`MPI_FILE_I{READ,WRITE}(_AT)_ALL` routines are added in MPI-3.1.
I don't know why other methods were missing.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-24 16:55:03 +09:00
KAWASHIMA Takahiro
0fcd96486a fortran: Fix MPI_ARGV(S)_NULL compilation error
Fortran constants `MPI_ARGV_NULL` and `MPI_ARGVS_NULL` are defined
in MPI-3.1 p.680 as below.

> `MPI_ARGVS_NULL`
>   2-dim. array of `CHARACTER*(*)`
> `MPI_ARGV_NULL`
>   array of `CHARACTER*(*)`

`MPI_ARGV_NULL` and `MPI_ARGVS_NULL` are used as an argument of
`MPI_COMM_SPAWN` and `MPI_COMM_SPAWN_MULTIPLE` respectively and
their argument `argv` and `array_of_argv` are defined as below
for `USE mpi_f08` binding in MPI-3.1.

```
CHARACTER(LEN=*), INTENT(IN) :: argv(*)
CHARACTER(LEN=*), INTENT(IN) :: array_of_argv(count, *)
```

Defining them as `INTEGER` in `mpi_f08` module will cause
a compilation error of user programs like
"There is no specific subroutine for the generic 'mpi_comm_spawn'".

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-24 13:53:12 +09:00
Ralph Castain
8b1f01dfe6 Set the default modex parameters back to full blocking modex while we continue to test and debug the slow modex - it seems to be having issues on the Cray
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-22 15:19:46 -07:00
Jeff Squyres
d32eff6ea2 mpif-externals.h: add missing MPI_AINT_ADD/MPI_AINT_DIFF
MPI_AINT_ADD and MPI_AINT_DIFF are functions and must be declared as
externals with the proper return type.  This is already done properly
in the mpi and mpi_f08 modules; these declarations for these functions
were only missing from mpif.h (i.e., mpif-externals.h).

Thanks to Aboorva Devarajan (@AboorvaDevarajan) for the bug report.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-22 08:57:54 -07:00
Gilles Gouaillardet
ebe6125750 mpi/c: MPI_PROC_NULL is not a valid rank in MPI_Win_{lock,unlock}
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-22 11:13:13 +09:00
Ralph Castain
f2ed293ecd Merge pull request #3398 from rhc54/topic/modex
Implement a background fence that collects all data during modex operation
2017-04-21 15:15:49 -07:00
Ralph Castain
9fc3079ac2 Implement a background fence that collects all data during modex operation
The direct modex operation is slow, especially at scale for even modestly-connected applications. Likewise, blocking in MPI_Init while we wait for a full modex to complete takes too long. However, as George pointed out, there is a middle ground here. We could kickoff the modex operation in the background, and then trap any modex_recv's until the modex completes and the data is delivered. For most non-benchmark apps, this may prove to be the best of the available options as they are likely to perform other (non-communicating) setup operations after MPI_Init, and so there is a reasonable chance that the modex will actually be done before the first modex_recv gets called.

Once we get instant-on-enabled hardware, this won't be necessary. Clearly, zero time will always out-perform the time spent doing a modex. However, this provides a decent compromise in the interim.

This PR changes the default settings of a few relevant params to make "background modex" the default behavior:

* pmix_base_async_modex -> defaults to true

* pmix_base_collect_data -> continues to default to true (no change)

* async_mpi_init - defaults to true. Note that the prior code attempted to base the default setting of this value on the setting of pmix_base_async_modex. Unfortunately, the pmix value isn't set prior to setting async_mpi_init, and so that attempt failed to accomplish anything.

The logic in MPI_Init is:

* if async_modex AND collect_data are set, AND we have a non-blocking fence available, then we execute the background modex operation

* if async_modex is set, but collect_data is false, then we simply skip the modex entirely - no fence is performed

* if async_modex is not set, then we block until the fence completes (regardless of collecting data or not)

* if we do NOT have a non-blocking fence (e.g., we are not using PMIx), then we always perform the full blocking modex operation.

* if we do perform the background modex, and the user requested the barrier be performed at the end of MPI_Init, then we check to see if the modex has completed when we reach that point. If it has, then we execute the barrier. However, if the modex has NOT completed, then we block until the modex does complete and skip the extra barrier. So we never perform two barriers in that case.

HTH
Ralph

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-21 10:29:23 -07:00
Howard Pritchard
462342d148 Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi
common/libfabric: move libfabric to ofi
2017-04-21 07:50:38 -06:00
Artem Polyakov
68167ec879 ompi/comm: Improve MPI_Comm_create algorithm
Force only procs that are participating in the ne Comm to decide what
    CID is appropriate. This will have 2 advantages:
    * Speedup Comm creation for small communicators: non-participating procs
      will not interfere
    * Reduce CID fragmentation: non-overlaping groups will be allowed to use
      same CID.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-21 08:33:29 +07:00
Howard Pritchard
841192645b common/libfabric: move libfabric to ofi
This PR renames the common library for OFI libfabric from
libfabric to ofi.  There are a number of reasons this
is good to do:

1) its shorter and replaces 9 characters with three for
   function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
   the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
   the term "ofi" internally and also in their configure options
   relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)

There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.

This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-20 13:07:16 -06:00
Ralph Castain
c86f71376a Increase fine grain of timing info
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-20 00:17:40 -07:00
Gilles Gouaillardet
ded63c5e0c ompi: use ompi_coll_base_sendrecv_actual() whenever possible
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-20 10:01:28 +09:00
Gilles Gouaillardet
52551d96c1 Merge pull request #3285 from ggouaillardet/topic/coll_zerobyte_messages
coll/base: always send/recv zero-byte messages
2017-04-20 09:22:47 +09:00
Gilles Gouaillardet
fa5cd0dbe5 use ptrdiff_t instead of OPAL_PTRDIFF_TYPE
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 13:41:56 +09:00
Gilles Gouaillardet
dcf9cca21f ompi/datatype: add the OMPI_DATATYPE_INIT_UNAVAILABLE_BASIC_TYPE macro
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 13:09:33 +09:00
bosilca
872cf44c28 Improve the opal_pointer_array & more (#3369)
* Complete rewrite of opal_pointer_array
Instead of a cache oblivious linear search use a bits array
to speed up the management of the free space. As a result we
slightly increase the memory used by the structure, but we get a
significant boost in performance.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Do not register datatypes in the f2c translation table.
The registration is now done up into the Fortran layer, by
forcing a call to MPI_Type_c2f.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-04-18 21:41:26 -04:00
Gilles Gouaillardet
23dad50d51 mpi/c: allow MPI_PROC_NULL in MPI_Win_shared_query()
This fixes a regression introduced in open-mpi/ompi@b3a20100d3

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 10:06:41 +09:00
Yossi
9ebcafd6d6 Merge pull request #3260 from derbeyn/fix_yalla
Fix yalla PML: MPI_Recv does not return MPI_ERR_TRUNCATE upon overflow
2017-04-18 11:37:48 +03:00
Alina Sklarevich
d93b67257b PML UCX: handle a synchronous send.
MCA_PML_BASE_SEND_SYNCHRONOUS

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 18:11:55 +03:00
Alina Sklarevich
eec310c99c PML/UCX/YALLA: Fix the message release call.
Set message to MPI_MESSAGE_NULL.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 14:41:13 +03:00
Gilles Gouaillardet
6886c1229a Merge pull request #3327 from jeffhammond/fix-issue-3326
check for negative ranks in ompi_win_peer_invalid
2017-04-13 10:53:32 +09:00
Ralph Castain
dadc924cde Cleanup warnings when timing is not enabled
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-11 17:29:27 -07:00
Jeff Hammond
b3a20100d3 check for negative ranks in ompi_win_peer_invalid
resolves #3326 (https://github.com/open-mpi/ompi/issues/3326)

Signed-off-by: jeff.r.hammond@intel.com
2017-04-11 14:26:16 -07:00
Nathan Hjelm
bea7d9e4f7 Merge pull request #3320 from hjelmn/osc_pt2pt_fix
osc/pt2pt: fix infinite frag allocation loop
2017-04-11 09:09:30 -06:00
Artem Polyakov
4477b87e1d Merge pull request #3303 from karasevb/timing2/master
OMPI timings
2017-04-11 07:52:40 -07:00
Boris Karasev
d132eab4a5 ompi/timings: fixed the error of opal timings env import
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-04-11 12:08:48 +06:00
Nathan Hjelm
12b52b2b2c osc/pt2pt: fix infinite frag allocation loop
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-10 16:30:47 -06:00
KAWASHIMA Takahiro
b4599d7bb7 datatype: Fix darray MPI_ACCUMULATE bug
Array sizes of `array_of_gsizes`, `array_of_distribs`, `array_of_dargs`,
and `array_of_psizes` parameters of the `ompi_datatype_create_darray`
function (and `MPI_TYPE_CREATE_DARRAY`) are all `ndims`.
`ndims` are `i[2]`, not `i[0]`. See MPI-3.1 p.122.

Because this function `__ompi_datatype_create_from_args` is used by
pt2pt OSC, using a datatype created by `MPI_TYPE_CREATE_DARRAY` for
`MPI_(R)(GET_)ACCUMULATE` caused a segmentation fault or something
on a target process.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-10 17:31:59 +09:00
Ralph Castain
95ae0d1df3 Cleanup timing macros for portability across compilers. Rename the --enable-timing configure option to be --enable-pmix-timing so it doesn't pickup external timing requests. Remove a stale function reference in PMIx so it can compile with timing enabled.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-10 12:56:38 +06:00
Howard Pritchard
f5942ff23c Merge pull request #3304 from hppritcha/topic/de-ortization-of-ompi
de-ORTEfy the ompi tree
2017-04-07 14:14:41 -06:00
Noah Evans
ef29fb13cb de-ORTEfy the ompi tree
The ompi tree should be runtime independent, but over time a few
ORTE depedent definitions and functions have escaped into the ompi
tree. I'm working on my own runtime so I've used this as an opportunity
to get rid of ORTE dependencies in the ompi/ tree. I still need to go
back and change orte to conform to the new world and these changes are
untested, but I can now compile (but not link) without orte so I'm
commiting this changeset.

Signed-off-by: Noah Evans <noah.evans@gmail.com>
2017-04-07 12:35:58 -06:00
Boris Karasev
36a0e71f2d ompi/timings: preparing to production state
Adds:
- enabling/disabling of timings throught environment variable `OMPI_TIMING_ENABLE`
- output format: [file name]:[function name]:[description]: avg/min/max
- dynamically extending array of results for case then inited size was exhausted
- catch and collect errors
- cleanup

Note:
For use feature need to configure with `--enable-timings`
and set env `OMPI_TIMING_ENABLE = 1`

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-04-07 21:16:57 +06:00
Artem Polyakov
e3acf2a339 ompi/timings: add OMPI-level timing framework.
This is an extension of OPAL timing framework that allows to use
MPI_reduce to provide the compact representation of the collected
timings throughout the whole application.

NOTE: the functionality is disabled now, it will be enabled after
the runtime verification.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:22 +06:00
Artem Polyakov
1063c0d567 opal/timing: remove timings from MPI_Init and MPI_Finalize
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:21 +06:00
Nadia Derbey
f918d88c3e Fix yalla PML: Update previous commit after Yossofe's review
Signed-off-by: Nadia Derbey <Nadia.Derbey@atos.net>
2017-04-06 07:58:26 +02:00
Gilles Gouaillardet
f3581c8259 coll/base: have alltoallv send/recv zero-bytes messages
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 13:44:17 +09:00
Gilles Gouaillardet
5492edd71e coll/base: have ompi_coll_base_sendrecv() send/recv zero-bytes messages
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 13:44:05 +09:00
Nathan Hjelm
1322e5dee8 Merge pull request #3274 from hjelmn/osc_rdma_fix
osc/rdma: fix typo in atomic code
2017-04-04 00:20:42 -06:00
Gilles Gouaillardet
5dfd4ab6ca coll/tuned: remove set-but-not-used variables
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-04 13:18:11 +09:00
Nathan Hjelm
fad0803920 osc/rdma: fix typo in atomic code
Fixes #3267

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-03 15:54:28 -06:00
Nadia Derbey
b6de94e449 Fix yalla PML: MPI_Recv does not return MPI_ERR_TRUNCATE upon overflow
Signed-off-by: Nadia Derbey <Nadia.Derbey@atos.net>
2017-03-30 15:18:31 +02:00
Xin Zhao
ee952fcccd Passing estimated_num_procs to UCX init in PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Nathan Hjelm
c72fb30eb5 osc/pt2pt: fix typo
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2017-03-23 09:00:21 -06:00
Xin Zhao
6a99c60fbd Add multithreading support in PML UCX framework.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-20 19:55:00 +02:00
Jeff Squyres
ce0e1cd32c Merge pull request #3201 from hppritcha/jjhursey-topic/timer-gettimeofday
Jjhursey topic/timer gettimeofday
2017-03-18 20:12:36 -04:00
Howard Pritchard
b9331527f5 timer: hack use of clock_gettime
better solution needed later
 workaround for #3003

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-18 15:08:59 -05:00
Ralph Castain
45b46dc446 Merge pull request #3181 from artpol84/add_proc_fix_2/master
ompi: Avoid unnecessary PMIx lookups when adding procs (step 2).
2017-03-16 15:06:08 -07:00
Jeff Squyres
760db0d5ce osc/pt2pt: fix compiler warning
Remove unused variable.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-16 05:46:11 -07:00
Jeff Squyres
1947280865 topo/treematch: squash some compiler warnings
Only define MIN/MAX if they are not already defined.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-16 05:44:26 -07:00
Joshua Hursey
48d13aa8ef mpi/c: Force wtick/wtime to use gettimeofday
* See https://github.com/open-mpi/ompi/issues/3003 for a discussion about
   this patch. Once we get a better version in place we can revert this
   change.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-03-15 21:24:37 -05:00
Artem Polyakov
1f7a3a2d54 ompi: Avoid unnecessary PMIx lookups when adding procs (step 2).
Follow-up for 717f3fef62.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-03-16 07:47:27 +07:00
Nathan Hjelm
37214eda09 Merge pull request #3164 from hjelmn/ob1_pinned
pml/ob1: do not cache leave_pinned
2017-03-14 13:22:18 -06:00
Nathan Hjelm
3e7ef48c13 pml/ob1: do not cache leave_pinned
This commit fixes a bug that disabled both the RDMA pipeline and RDMA
protocols in ob1. ob1 was internally caching the values of
opal_leave_pinned and opal_leave_pinned_pipeline at init time. This is
no longer valid as opal_leave_pinned may be set by any call to a btl's
add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 09:00:40 -06:00
Valentin Petrov
fe069c9570 Fixes the coll_allgather usage bug
One should use the correct module object when calling
      c_coll.coll_allgather. Otherwise there will be a segfault in the
      case, for example, when hcoll is used. In that case
      c_coll.coll_allgather = mca_coll_hcoll_allgather while
      c_coll.coll_gather_module = tuned.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2017-03-14 09:47:39 +02:00
Jeff Squyres
086748bb70 Merge pull request #3102 from omor1/master
Add missing definition of MPI_T_PVAR_SESSION_NULL (resolve #2652)
2017-03-13 15:27:05 -04:00
Alex Mikheev
c081239f88
ompi: pml ucx: fix persistant request init CR changes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-08 13:26:29 +02:00
Alex Mikheev
c113c37a7a
ompi: pml ucx: fix persistant request initialization
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-08 10:59:41 +02:00
Nathan Hjelm
0195d15401 osc/pt2pt: flush pending fragments on lock ack
This commit addresses an issue that can occur in cases where a lot of
fragments are outstanding.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-06 13:58:46 -07:00
Edgar Gabriel
607dc2c039 Merge pull request #3103 from edgargabriel/pr/sharedfp-name-collision-fix
sharedfp/lockedfile and sm: fix the namecollision
2017-03-05 14:46:20 -06:00
Edgar Gabriel
2d462b3b80 sharedfp/lockedfile and sm: fix name collision
this fixes the issue reported by Nicolas Joly on the mailing: the sharedfp/lockedfile component does not support right now a scenario where multiple jobs read from the same input file, due to a collision of the filenames utilized for the sharedfp handle. Although not part of the oroginal report, the same occurs for the sharedfp/sm component. Add therefore the jobid to be part of the lockedfilename/sm file name.

use the OMPI_CAST_RTE_NAME macro to determine jobid

Fixes: #3098

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-03-05 11:28:28 -06:00
Omri Mor
20ab37a297 Add missing MPI_T_PVAR_SESSION_NULL to mpi.h
MPI_T_pvar_session_free() should reject null sessions and set *session to MPI_T_PVAR_SESSION_NULL

Signed-off-by: Omri Mor <omri50@gmail.com>
2017-03-05 09:03:30 -06:00
Artem Polyakov
9448814c40 ompi/pml/ucx: Fix uninitialized UCX request field.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-03-05 03:06:30 +07:00
Edgar Gabriel
d1fed77781 Merge pull request #3094 from edgargabriel/pr/master-lustre-priority
io/ompio: adjust the priority of the OMPIO component on lustre
2017-03-03 09:29:14 -06:00
KAWASHIMA Takahiro
39294caf04 Merge pull request #3086 from kawashima-fj/pr/coll-base-defs
coll: Update `ompi/mca/coll/base/coll_base_functions.h`
2017-03-03 18:53:00 +09:00
KAWASHIMA Takahiro
7cb42d9aaa Merge pull request #3085 from kawashima-fj/pr/pml-bfo-typo
pml/bfo: Correct a function name and header filenames
2017-03-03 18:48:01 +09:00
Edgar Gabriel
9e19834327 io/ompio: adjust the priority of the OMPIO component on lustre
this commit brings over the behavior from the 2.x series to master, mostly with the fork for the 3.x series in mind.
Also, use strncasecmp instead of two strncmps

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-03-02 12:10:11 -06:00
Jeff Squyres
dc53cd5f74 MPI_Wtick: may return a higher resolution than 10e-6 these days
Thanks to Mark Dixon (@ccaamad) for reporting the error.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-02 10:39:28 -05:00
KAWASHIMA Takahiro
c4ca5e703d coll: Update ompi/mca/coll/base/coll_base_functions.h
- Support MPI-2.2 and MPI-3.0 COLL features.

  * `MPI_REDUCE_SCATTER_BLOCK`
  * neighborhood collective communication
  * nonblocking collective communication

- Add `*_BASE_ARGS` and `*_BASE_ARG_NAMES` for convenience.

- Use parameter names used in the MPI Standard.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-03-02 17:58:02 +09:00
KAWASHIMA Takahiro
96aa0d90c1 pml/bfo: Correct a function name and header filenames
These lines were incorrectly modified in 90f2940.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-03-02 16:02:53 +09:00
Alex Mikheev
152f77df59
ompi: pml ucx: fix datatype packing error in bsend
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-01 16:18:19 +02:00
Yossi Itigin
33471c44ee pml_yalla/mtl_mxm/hcoll: open memory component to activate memory hooks.
Memory hooks are now set-up on demand. pml/yalla, mtl/mxm and
coll/hcoll need the memory hooks, so make sure those are installed.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-03-01 12:12:20 +02:00
Gilles Gouaillardet
880f2d5431 mpi/c: revamp error handling in MPI_{Pack,Unpack}[_external]
Thanks Alex and the folks at Mellanox for the help.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-01 10:03:31 +09:00
Jeff Squyres
d5266aba90 Merge pull request #2955 from jsquyres/pr/hwloc-external-fixes
Fix --with-hwloc=external
2017-02-28 14:57:07 -05:00
Josh Hursey
0006f0d7c5 Merge pull request #2773 from jjhursey/topic/hook-fwk
Add a 'hook' framework
2017-02-28 12:29:50 -06:00
Ralph Castain
735fbf8f67 Merge pull request #3011 from artpol84/add_proc_fix/master
ompi: Avoid unnecessary PMIx lookups when adding procs.
2017-02-28 08:25:08 -08:00
Jeff Squyres
fec519a793 hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h
Per a prior commit, the presence of "hwloc.h" can cause ambiguity when
using --with-hwloc=external (i.e., whether to include
opal/mca/hwloc/hwloc.h or whether to include the system-installed
hwloc.h).

This commit:

1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h.
2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to
   find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc.
3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the
   rest of the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:48:42 -08:00
Jeff Squyres
0cd3b6c235 treematch: do not include <hwloc.h>
Instead, include "opal/mca/hwloc/hwloc.h"

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:45:23 -08:00
Josh Hursey
b1c4e50500 Merge pull request #2934 from jjhursey/topic/coll-comm-restructure
Move coll structure outside of the communicator
2017-02-28 08:45:18 -06:00
Nathan Hjelm
032bcf915a osc/rdma: fix compile warning
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-27 16:26:00 -07:00
George Bosilca
366d64b7e5 Move the collective structure outside the communicator.
As we changed the ABI (forcing a major release), we can limit
the size of the predefined communicators by moving the collective
structure outside the communicator. This might have a minimal,
but unnoticeable, impact on performance. This approach has been
discussed during the January 2017 devel meeting.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-27 11:54:17 -06:00
Joshua Hursey
c10bbfded6 ompi/hook: Add the hook/license framework
* Include a 'demo' component that shows some of the features.
 * Currently has hooks for:
   - MPI_Initialized
     - top, bottom
   - MPI_Init_thread
     - top, bottom
   - MPI_Finalized
     - top, bottom
   - MPI_Init
     - top (pre-opal_init), top (post-opal_init), error, bottom
   - MPI_Finalize
     - top, bottom
 * Other places in ompi can 'register' to hook into any one of these places
   by passing back a component structure filled with function pointers.
 * Add a `MCA_BASE_COMPONENT_FLAG_REQUIRED` flag to the MCA structure that
   is checked by the `hook` framework. If a required, static component has
   been excluded then the `hook` framework will fail to initialize.
   - See note in `opal/mca/mca.h` as to why this is checked in the `hook`
     framework and not in `opal/mca/base/mca_base_component_find.c`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-27 12:05:53 -05:00
Nathan Hjelm
581bff9871 Merge pull request #3034 from hjelmn/osc_rdma_atomic
osc/rdma: make locking code more robust
2017-02-27 08:46:52 -07:00
Nathan Hjelm
4707c7c5e0 osc/rdma: make locking code more robust
Under heavy load the locking code could fail if the underlying btl
module started to return OPAL_ERR_OUT_OF_RESOURCE on atomic
operations. This commit updates the code to gracefully handle btl
errors.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-27 00:01:26 -07:00
Gilles Gouaillardet
af0b5cffb4 asm: rename the AMD64 into X86_64
in this context, AMD64 really means amd64 or em64t, so let's
rename this into X86_64 in order to avoid any confusion

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-27 15:10:50 +09:00
Sylvain Jeaugey
f827b6b8dd Fix more typos using the allgather module for allreduce operations, causing a crash when CUDA collectives are enabled.
Signed-off-by: Sylvain Jeaugey <sjeaugey@nvidia.com>
Signed-off-by: Akshay Venkatesh <akvenkatesh@nvidia.com>
2017-02-24 16:35:29 -08:00
Yossi
fb67c966a8 Merge pull request #2944 from alex-mikheev/topic/pml_ucx_bsend
ompi: pml ucx: add support for the buffered send
2017-02-22 12:21:03 +02:00
Artem Polyakov
717f3fef62 ompi: Avoid unnecessary PMIx lookups when adding procs.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-02-22 16:09:30 +07:00
Alex Mikheev
b015c8bb48 ompi: pml ucx: add support for the buffered send
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-21 17:19:22 +02:00
Gilles Gouaillardet
4184c01be5 Merge pull request #2393 from bosilca/topic/no_predefined_ddt_refcount
Don't refcount the predefined datatypes.
2017-02-21 09:38:11 +09:00
Todd Kordenbrock
048f757d9f osc-portals4: add support for noncontiguous datatypes
This commit implements onesided operations for noncontiguous
datatypes using two different algorithms.

 * If the result and/or origin datatype is noncontiguous and the
   target datatype is contiguous, then an iovec MD is created for
   the result and origin.  The operation is performed using a
   single Portals4 call (unless it exceeds the max message size).
 * If the target datatype is noncontigous, then an algorithm
   similar to the one in osc-rdma is used to loop over the
   contiguous blocks of each datatype.  The operation is
   performed using multiple Portals4 calls.

This commit ensures that individual operations do not exceed the
max atomic size or the max message size supported by the device.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-02-15 16:17:13 -06:00
Gilles Gouaillardet
cd4537193c osc/sm: fix MPI_Win_allocate_shared() alignment
add padding so the memory allocated by MPI_Win_allocate_shared()
is 64 bytes aligned.

Thanks Joseph Schuchart for the bug report

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-15 13:40:48 +09:00
Josh Hursey
0b273c2561 Merge pull request #2808 from jjhursey/fix/ibm/reduce-local-to-coll
coll: Move reduce_local into the coll framework
2017-02-14 15:54:15 -06:00
Nathan Hjelm
cc4a0fabcf Merge pull request #2727 from hjelmn/osc_rdma
osc/rdma: fix typo in check for MPI_MODE_NOCHECK
2017-02-14 10:50:33 -07:00
Joshua Hursey
78006f93a4 coll: Move reduce_local into the coll framework
* Since we are adding a new function to `mca_coll_base_module_2_1_0_t`
   we need to increase the version of the module structure to `2_2_0`.
 * Add a comment just above the PREDEFINED_COMMUNICATOR_PAD describing
   it's purpose and when it should change. To help future developers
   trying to answer the question noted in the comment.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-14 08:56:07 -06:00
Gilles Gouaillardet
e70a30cca4 coll/libnbc: optimize zero size ialltoall{v,w} with MPI_IN_PLACE
and incidentally avoids malloc(0)

Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#2945

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-13 15:21:28 +09:00
Gilles Gouaillardet
12949547f4 coll/libnbc: fix a2aw_sched_linear() with zero size datatype or zero count
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-13 15:21:28 +09:00
Joshua Hursey
383330a50d coll/basic: Expand check for negative input values
* Negative values are parameter errors for neighborhood collectives
   - Add checks to the mpi/c interface `MPI_PARAM_CHECK`
 * Fix a success check for neighbor_alltoallw with dist_graph

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-08 14:26:32 -06:00
Geoff Paulsen
4917e44a7d Merge pull request #2832 from jjhursey/topic/ibm/osc-base-dt-abort
osc/base: Detect unsupported data types and abort
2017-02-05 04:26:04 -06:00
Howard Pritchard
f4ad119693 Merge pull request #2914 from hppritcha/topic/nbc_compiler_warning
swat some compiler warnings
2017-02-04 11:56:52 -05:00
Howard Pritchard
acaecb2448 swat some compiler warnings
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-02-03 08:28:15 -07:00
Gilles Gouaillardet
e879d2910a coll/tuned: make coll_tuned_gather_algorithms MCA settable
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-02 11:00:38 +09:00
Nathan Hjelm
362ac8b87e osc/pt2pt: fix threading issues
This commit fixes a number of threading issues discovered in
osc/pt2pt. This includes:

 - Lock the synchronization object not the module in osc_pt2pt_start.
   This fixes a race between the start function and processing post
   messages.

 - Always lock before calling cond_broadcast. Fixes a race between
   the waiting thread and signaling thread.

 - Make all atomically updated values volatile.

 - Make the module lock recursive to protect against some deadlock
   conditions. Will roll this back once the locks have been
   re-designed.

 - Mark incoming complete *after* completing an accumulate not
   before. This was causing an incorrect answer under certain
   conditions.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-01 10:33:01 -07:00