1
1
Граф коммитов

6449 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
081f9bc8db
Use OPAL random generator.
This fix is related to issue #1877, and prevents the OMPI library from
messing the user level random values.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-07-26 11:54:23 -04:00
George Bosilca
fbe6c22b90
Make sure the gather is called in all cases, and not
simply based on some local state. This is the second
part of the patch proposed for open-mpi/ompi#1183.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-07-26 11:52:47 -04:00
Ralph Castain
7a83fdb9bb Update to hwloc 2.0.0a with shmem support.
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 20:26:22 -07:00
Boris Karasev
d917d54ddc configure: detect UCX support by default
Adds detecting UCX from following paths: "/usr /usr/local /opt/ucx"

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-07-25 23:48:49 +03:00
Joshua Ladd
8f5cb4c459 Merge pull request #3690 from xinzhao3/topic/ompi-osc-ucx
Topic/ompi-osc-ucx: Add ucx implementation for ompi osc
2017-07-25 16:26:45 -04:00
Edgar Gabriel
ca1462a889 common/ompio: adjust location of fcoll_base_file_select
adjust the location on where the fcoll_base_file_select function is
colled to ensure that all fs level parameters are correctly set.

io/ompio: minor fixes to initialization of the stripe_size and an if statement in the
simple_grouping option.

Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
2017-07-25 10:43:38 -05:00
Edgar Gabriel
450ccd439b fcoll/base: adjust selection table
adjust the fcoll selection table to achieve the following:
 - two_phase should not advertise itself on lustre file systems
 - two_phase should advertise itself on sequential file systems (stripe_size == 0 )
 - priority for dynamic, static and individual is reduced. This will lead to
   two_phase being selected in scenarios where two or more components indicate
   willingness to run.

Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
2017-07-25 10:37:22 -05:00
Gilles Gouaillardet
e054f870a8 Merge pull request #3901 from Zzzoom/nbc_remove_progress_lock
coll/libnbc: demote progress_lock to regular flag
2017-07-25 09:24:08 +09:00
Carlos Bederián
1767b218fb coll/libnbc: demote progress_lock to regular flag
Signed-off-by: Carlos Bederián <bc@famaf.unc.edu.ar>
2017-07-24 20:19:55 -03:00
Gilles Gouaillardet
60aa9cfcb6 hwloc: add support for hwloc v2 API
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:44 +09:00
Xin Zhao
2aa5292dbf Add UCX component for ompi/mca/osc for MPI one-sided communication.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-07-19 19:45:40 +03:00
Edgar Gabriel
b363a0f4db Merge pull request #3908 from edgargabriel/pr/lustre-header-update
fs/lustre: update lustre header file used in the component
2017-07-18 09:57:11 -05:00
Gilles Gouaillardet
1b46fe2d9a pml/ob1: fix mca_pml_ob1_progress_needed usage
correctly use OPAL_ATOMIC_ADD32() that returns the *new* value
and *not* the previous one.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-18 09:30:57 +09:00
Nathan Hjelm
5a5edfce88 Merge pull request #3876 from hppritcha/topic/add_psm2_get_stats
mtl/psm2: add pvar support for PSM2 MQ stats
2017-07-17 16:02:51 -05:00
Edgar Gabriel
13b14f5efe Merge pull request #3906 from edgargabriel/pr/lazy_open_fix
common/ompio: fix the lazy_open flag
2017-07-17 10:58:13 -05:00
Edgar Gabriel
bc8f642211 fs/lustre: update lustre header file used in the component
liblustreapi.h is at this point deprecated. Switch to lustreapi.h instead

fixes issue #3223

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-07-17 10:33:52 -05:00
Edgar Gabriel
8e17827a13 common/ompio: fix the lazy_open flag
fixes an erroneous error code being returned when activating
the mca_io_ompio_sharedfp_lazy_open flag with MPI_MODE_APPEND.

fixes issue #3904

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-07-17 09:44:34 -05:00
Edgar Gabriel
4bdddfb74b io/ompio: fix grouping option
changing the value of mca_io_ompio_grouping_option lead to a segfault due to
a double-free problem. Remove the erroneous free statements that have been introduced
and add a note ensuring that we are not re-adding them back at that spot.

fixes issue #3903

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-07-17 09:38:10 -05:00
Yossi Itigin
b0692c6836 Merge pull request #3833 from yosefe/topic/pml-yalla-dt-size-fix
pml/yalla: fix getting size of a continuous type.
2017-07-16 11:19:16 +03:00
Howard Pritchard
701a1d0218 mtl/psm2: add pvar support for PSM2 MQ stats
Add pvars for PSM2 MQ stats to help in analyzing performance
of Omnipath.

Tested (modestly) using modified OSU pt2pt benchmarks.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-07-14 10:31:35 -06:00
Ryan Grant
0ce8590e7c Merge pull request #3837 from tkordenbrock/topic/master/get.retry.timeout
master: mtl-portals4: add timeout to rendezvous get fragments
2017-07-13 09:59:54 -06:00
Nathan Hjelm
6fb81f20e4 mtl/psm2: create mca variables to shadow PSM2 environment variables
This commit enables MCA support for the following PSM2 environment
variables: PSM2_DEVICES, PSM2_MEMORY, PSM2_MQ_SENDREQS_MAX,
PSM2_MQ_RECVREQS_MAX, PSM2_MQ_RNDV_HFI_THRESH,
PSM2_MQ_RNDV_SHM_THRESH, PSM2_RCVTHREAD, PSM2_SHAREDCONTEXTS,
PSM2_SHAREDCONTEXTS_MAX, and PSM2_TRACEMASK. These variable can be set
by MCA if they are not already set in the environment.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-13 09:48:46 -06:00
Christoph Niethammer
e1fc6ae304 Change filename for shared_fm file to include comm cid instead of masterjobid.
Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
2017-07-13 01:38:40 +02:00
Jeff Squyres
ccf17808b6 Merge pull request #3258 from markalle/pr/symbol_name_pollution
symbol name pollution
2017-07-12 16:19:25 -05:00
Nathan Hjelm
3c0e94afab mpi/neighbor_allgatherv: fix copy&paste error and add helpers
This commit adds a helper function to get the inbound and outbound
neighbor count and updates the neighbor_allgatherv bindings to use the
correct count when checking the input parameters.

Fixes #2324

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-12 12:54:24 -06:00
Nathan Hjelm
bf1c863b96 osc/pt2pt: make progress in flush*_local
There is no reason not to progress OSC during the MPI_Win_flush_local
and MPI_Win_flush_all_local calls. This fixes #3750.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-12 09:09:58 -06:00
Gilles Gouaillardet
7a866f754c topo/treematch: fix topo_treematch_distgraph_create
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-12 10:27:45 +09:00
Nathan Hjelm
e73ab93ebf pml/ob1: do not access fragment after calling btl rget
This commit fixes a bug that occurs when the btl callback happens before
the rget returns. In this case the fragment has been returned and is no
longer valid. This commit saves the size before calling rget. This is
valid since the BTL is not allowed to change the read size.

Fixes #3821

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-11 15:59:40 -06:00
Mark Allen
552216f9ba scripted symbol name change (ompi_ prefix)
Passed the below set of symbols into a script that added ompi_ to them all.

Note that if processing a symbol named "foo" the script turns
    foo  into  ompi_foo
but doesn't turn
    foobar  into  ompi_foobar

But beyond that the script is blind to C syntax, so it hits strings and
comments etc as well as vars/functions.

    coll_base_comm_get_reqs
    comm_allgather_pml
    comm_allreduce_pml
    comm_bcast_pml
    fcoll_base_coll_allgather_array
    fcoll_base_coll_allgatherv_array
    fcoll_base_coll_bcast_array
    fcoll_base_coll_gather_array
    fcoll_base_coll_gatherv_array
    fcoll_base_coll_scatterv_array
    fcoll_base_sort_iovec
    mpit_big_lock
    mpit_init_count
    mpit_lock
    mpit_unlock
    netpatterns_base_err
    netpatterns_base_verbose
    netpatterns_cleanup_narray_knomial_tree
    netpatterns_cleanup_recursive_doubling_tree_node
    netpatterns_cleanup_recursive_knomial_allgather_tree_node
    netpatterns_cleanup_recursive_knomial_tree_node
    netpatterns_init
    netpatterns_register_mca_params
    netpatterns_setup_multinomial_tree
    netpatterns_setup_narray_knomial_tree
    netpatterns_setup_narray_tree
    netpatterns_setup_narray_tree_contigous_ranks
    netpatterns_setup_recursive_doubling_n_tree_node
    netpatterns_setup_recursive_doubling_tree_node
    netpatterns_setup_recursive_knomial_allgather_tree_node
    netpatterns_setup_recursive_knomial_tree_node
    pml_v_output_close
    pml_v_output_open
    intercept_extra_state_t
    odls_base_default_wait_local_proc
    _event_debug_mode_on
    _evthread_cond_fns
    _evthread_id_fn
    _evthread_lock_debugging_enabled
    _evthread_lock_fns
    cmd_line_option_t
    cmd_line_param_t
    crs_base_self_checkpoint_fn
    crs_base_self_continue_fn
    crs_base_self_restart_fn
    event_enable_debug_output
    event_global_current_base_
    event_module_include
    eventops
    sync_wait_mt
    trigger_user_inc_callback
    var_type_names
    var_type_sizes

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-07-11 02:13:23 -04:00
Todd Kordenbrock
5ecd905358 mtl/portals4: move opal_timer_base_get_usec() out of the fast path
Rearrange the receive frag timeout logic to avoid calling
opal_timer_base_get_usec() in read_msg().  Instead set it at the first
retry.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-07-09 22:12:45 -05:00
Todd Kordenbrock
37766d770d mtl/portals4: if frag retry fails, then fail the entire receive
If the a frag cannot be retried because the ni_fail_type is other than
PTL_NI_DROPPED, then set the return type and jump to callback_error.
This sets MPI_ERROR and completes the receive.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-07-09 22:12:31 -05:00
Piotr Lesnicki
99453e6b10 mtl/portals4: get retransmission REPLY code
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-07-09 22:12:25 -05:00
Piotr Lesnicki
06b15cebbf mtl/portals4: add timeout to get retransmit
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-07-09 22:12:08 -05:00
Yossi Itigin
0522179efc pml/yalla: use opal_datatype_span() to get config type length.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-07-10 01:25:42 +03:00
Yossi Itigin
e94c6b16f0 pml/yalla: fix getting size of a continuous type.
pull request #3765 introduced a bug where the extent of a type is used
instead of its size.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-07-07 19:03:54 +03:00
Gilles Gouaillardet
fc11c37223 Merge pull request #3646 from ggouaillardet/spacc-fix-coverity-warnings
coll/spacc: misc fixes
2017-07-06 11:39:14 +09:00
Mikhail Kurnosov
44acc92104 Fix buffer overflow
Add check for bounds of sindex[] and rindex[].

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2017-07-06 10:49:08 +09:00
Gilles Gouaillardet
5fceca235b coll/spacc: silence more coverity warnings in mca_coll_spacc_allreduce_intra_redscat_allgather()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-06 10:49:08 +09:00
Mikhail Kurnosov
2f0f476642 Silence spacc coverity warnings
1. Add assert for opal_hibit return value: comm_size is always > 1.
2. Modified verbose output (dead-code warning).

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2017-07-06 10:49:08 +09:00
Ralph Castain
31130a4bee Replace syntax with something less strictly C99
Fixes #3809

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-05 16:54:36 -07:00
Gilles Gouaillardet
d1c5955b73 coll/base: optimize handling of zero-byte datatypes in mca_coll_base_alltoallv_intra_basic_inplace()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-30 09:47:08 +09:00
Gilles Gouaillardet
7e5e5fe887 Merge pull request #3719 from ggouaillardet/topic/libnbc_revamp
coll/libnbc: revisit NBC_Handle usage
2017-06-29 11:13:58 +09:00
Nathan Hjelm
022c658bbf osc/rdma: rework locking code to improve behavior of unlock
This commit changes the locking code to allow the lock release to be
non-blocking. This helps with releasing the accumulate lock which may
occur in a BTL callback.

Fixes #3616

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-27 15:29:51 -06:00
George Bosilca
f8ffec926e
Protect the monitoring infrastructure initialization. 2017-06-27 18:35:24 +02:00
Clément FOYER
c885ee3f3c Fix Coverity warning CID 1413323 (#3764)
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
2017-06-27 12:39:31 +02:00
bosilca
d55b666834 Topic/monitoring (#3109)
Add a monitoring PML, OSC and IO. They track all data exchanges between processes,
with capability to include or exclude collective traffic. The monitoring infrastructure is
driven using MPI_T, and can be tuned of and on any time o any communicators/files/windows.
Documentations and examples have been added, as well as a shared library that can be
used with LD_PRELOAD and that allows the monitoring of any application.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>


* add ability to querry pml monitorinting results with MPI Tools interface
using performance variables "pml_monitoring_messages_count" and
"pml_monitoring_messages_size"

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Fix a convertion problem and add a comment about the lack of component
retain in the new component infrastructure.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Allow the pvar to be written by invoking the associated callback.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Various fixes for the monitoring.
Allocate all counting arrays in a single allocation
Don't delay the initialization (do it at the first add_proc as we
know the number of processes in MPI_COMM_WORLD)

Add a choice: with or without MPI_T (default).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Cleanup for the monitoring module.
Fixed few bugs, and reshape the operations to prepare for
global or communicator-based monitoring. Start integrating
support for MPI_T as well as MCA monitoring.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Adding documentation about how to use pml_monitoring component.

Document present the use with and without MPI_T.
May not reflect exactly how it works right now, but should reflects
how it should work in the end.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global variables in pml_monitoring.c.
Change mca_pml_monitoring_flush() signature so we don't need the size and rank parameters.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Improve monitoring support (including integration with MPI_T)

Use mca_pml_monitoring_enable to check status state. Set mca_pml_monitoring_current_filename iif parameter is set
Allow 3 modes for pml_monitoring_enable_output: - 1 : stdout; - 2 : stderr; - 3 : filename
Fix test : 1 for differenciated messages, >1 for not differenciated. Fix output.
Add documentation for pml_monitoring_enable_output parameter. Remove useless parameter in example
Set filename only if using mpi tools
Adding missing parameters for fprintf in monitoring_flush (for output in std's cases)
Fix expected output/results for example header
Fix exemple when using MPI_Tools : a null-pointer can't be passed directly. It needs to be a pointer to a null-pointer
Base whether to output or not on message count, in order to print something if only empty messages are exchanged
Add a new example on how to access performance variables from within the code
Allocate arrays regarding value returned by binding

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add overhead benchmark, with script to use data and create graphs out of the results
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix segfault error at end when not loading pml
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Start create common monitoring module. Factorise version numbering
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix microbenchmarks script
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Improve readability of code

NULL can't be passed as a PVAR parameter value. It must be a pointer to NULL or an empty string.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add osc monitoring component

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add error checking if running out of memory in osc_monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Resolve brutal segfault when double freeing filename
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Moving to ompi/mca/common the proper parts of the monitoring system
Using common functions instead of pml specific one. Removing pml ones.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add calls to record monitored data from osc. Use common function to translate ranks.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix test_overhead benchmark script distribution

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix linking library with mca/common

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add passive operations in monitoring_test

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix from rank calculation. Add more detailed error messages

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix alignments. Fix common_monitoring_get_world_rank function. Remove useless trailing new lines

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix osc_monitoring mget_message_count function call

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Change common_monitoring function names to respect the naming convention. Move to common_finalize the common parts of finalization. Add some comments.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add monitoring common output system

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add error message when trying to flush to a file, and open fails. Remove erroneous info message when flushing wereas the monitoring is already disabled.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Consistent output file name (with and without MPI_T).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Always output to a file when flushing at pvar_stop(flush).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Update the monitoring documentation.
Complete informations from HowTo. Fix a few mistake and typos.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use the world_rank for printf's.
Fix name generation for output files when using MPI_T. Minor changes in benchmarks starting script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean potential previous runs, but keep the results at the end in order to potentially reprocess the data. Add comments.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add security check for unique initialization for osc monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean the amout of symbols available outside mca/common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Remove use of __sync_* built-ins. Use opal_atomic_* instead.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Allocate the hashtable on common/monitoring component initialization. Define symbols to set the values for error/warning/info verbose output. Use opal_atomic instead of built-in function in osc/monitoring template initialization.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Deleting now useless file : moved to common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add histogram ditribution of message sizes

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add histogram array of 2-based log of message sizes. Use simple call to reset/allocate arrays in common_monitoring.c

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add informations in dumping file. Separate per category (pt2pt/osc/coll (to come)) monitored data

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add coll component for collectives communications monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix warning messages : use c_name as the magic id is not always defined. Moreover, there was a % missing. Add call to release underlying modules. Add debug info messages. Add warning which may lead to further analysis.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix log10_2 constant initialization. Fix index calculation for histogram array.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add debug info messages to follow more easily initialization steps.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Group all the var/pvar definitions to common_monitoring. Separate initial filename from the current on, to ease its lifetime management. Add verifications to ensure common is initialized once only. Move state variable management to common_monitoring.
monitoring_filter only indicates if filtering is activated.
Fix out of range access in histogram.
List is not used with the struct mca_monitoring_coll_data_t, so heritate only from opal_object_t.
Remove useless dead code.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix invalid memory allocation. Initialize initial_filename to empty string to avoid invalid read in mca_base_var_register.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Don't install the test scripts.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix missing procs in hashtable. Cache coll monitoring data.
    * Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to the PML layer.
    * Cache monitoring data relative to collectives operations on creation.
    * Remove double caching.
    * Use same proc name definition for hash table when inserting and
      when retrieving.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use intermediate variable to avoid invalid write while retrieving ranks in hashtable.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add missing release of the last element in flush_all. Add release of the hashtable in finalize.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use a linked list instead of a hashtable to keep tracks of communicator data. Add release of the structure at finalize time.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Set world_rank from hashtable only if found

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use predefined symbol from opal system to print int

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Move collective monitoring data to a hashtable. Add pvar to access the monitoring_coll_data. Move functions header to a private file only to be used in ompi/mca/common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error value. Fix releasing of coll_data_t objects. Affect value only if data is found in the hashtable.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add automated check (with MPI_Tools) of monitoring.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix procs list caching in common_monitoring_coll_data_t

    * Fix monitoring_coll_data type definition.
    * Use size(COMM_WORLD)-1 to determine max number of digits.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add linking to Fortran applications for LD_PRELOAD usage of monitoring_prof

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add PVAR's handles. Clean up code (visibility, add comments...). Start updating the documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix coll operations monitoring. Update check_monitoring accordingly to the added pvar. Fix monitoring array allocation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Documentation update.
Update and then move the latex and README documentation to a more logical place

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Aggregate monitoring COLL data to the generated matrix. Update documentation accordingly.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix monitoring_prof (bad variable.vector used, and wrong array in PMPI_Gather).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory footprint of monitoring_prof. Unify OSC related outputs.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add the use of a machine file for overhead benchmark

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Check for out-of-bound write in histogram

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix common_monitoring_cache object init for MPI_COMM_WORLD

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add RDMA benchmarks to test_overhead
Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add computation of average and median of overheads. Add comments and copyrigths to the test_overhead script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add technical documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Adapt to the new definition of communicators

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Update expected output in test/monitoring/monitoring_test.c

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add dumping histogram in edge case

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add consistency in header inclusion.
Include ompi/mpi/fortran/mpif-h/bindings.h only if needed.
Add sanity check before emptying hashtable.
Fix typos in documentation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* misc monitoring fixes

* test/monitoring: fix test when weak symbols are not available
* monitoring: fix a typo and add a missing file in Makefile.am
and have monitoring_common.h and monitoring_common_coll.h included in the distro
* test/monitoring: cleanup all tests and make distclean a happy panda
* test/monitoring: use gettimeofday() if clock_gettime() is unavailable
* monitoring: silence misc warnings (#3)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

* Cleanups.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Changing int64_t to size_t.
Keep the size_t used accross all monitoring components.
Adapt the documentation.
Remove useless MPI_Request and MPI_Status from monitoring_test.c.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add parameter for RMA test case

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean the maximum bound computation for proc list dump.
Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0dbe5.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add communicator-specific monitored collective data reset

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add monitoring scripts to the 'make dist'
Also install them in the build and the install directories.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-26 18:21:39 +02:00
Nathan Hjelm
31ab83362a osc/rdma: cleanup local peer setup and fix a bug
The data endpoint was not being set correctly for local peers in some
cases. This commit fixes the bug and cleans the associated code to
simplify the logic.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-22 13:28:45 -06:00
Christoph Niethammer
7f1347677d Create file for file backed shared memory in process job session dir.
Prevents file collisions and can also be cleaned by orte-clean properly.

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
2017-06-22 08:25:34 +02:00
George Bosilca
1f291c8728
Add the fragment to the unexpected frags only after extracting the
pml_proc.
2017-06-20 16:03:52 +02:00
Gilles Gouaillardet
9ba85b85e1 coll/libnbc: revisit NBC_Handle usage
make NBC_Handle (almost) an internal structure created
by NBC_Schedule_request()
use a local variable instead of what was previously handle->tmpbuf

Refs open-mpi/ompi#3487

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-20 17:24:16 +09:00
Gilles Gouaillardet
68ac95003f coll/base: fix zero size datatype handling in mca_coll_base_alltoallv_intra_basic_inplace()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-20 14:36:35 +09:00
Nathan Hjelm
0c8c7e50d0 Merge pull request #3682 from hjelmn/comm_assertions
ompi: add support for new communicator info assertions
2017-06-19 09:49:59 -06:00
Edgar Gabriel
70107b3e52 Merge pull request #3703 from edgargabriel/pr/cart-comm-file-open-fix
Pr/cart comm file open fix
2017-06-15 15:03:38 -05:00
Edgar Gabriel
3b0b8fa12c io/ompio: update cartesian based grouping strategy
update the cartesian communicator based grouping strategy to match the other
algorithms used in the aggregator selection process.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-06-15 14:05:54 -05:00
Edgar Gabriel
bd6b430798 common/ompio: remove function call to cart_based_grouping
the cart_based_grouping aggregator strategy was not correctly updated
during the last major rewrite of the aggregator selection algorithm.
It is also not supposed to be called from file_open (but from
file_set_view).

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-06-15 14:04:03 -05:00
George Bosilca
e9d533e62e
Fix warnings from non-debug mode.
Thanks Ralph for the report.
2017-06-13 16:57:42 -04:00
Joshua Hursey
80a91dc244 io/romio314: Add work around support for missing MPI_File ops
* Add work around support for the following missing ops in ROMIO 3.1.4
    - `MPI_File_iread_at_all`
    - `MPI_File_iwrite_at_all`
    - `MPI_File_iread_all`
    - `MPI_File_iwrite_all`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-06-09 14:42:59 -05:00
Nathan Hjelm
db2204f2f3 ompi: add support for new communicator info assertions
This commit adds code to allow support for the info assertions added
by mpi-forum/mpi-issues#11. The assertions added are:
mpi_assert_no_any_tag, mpi_assert_no_any_source,
mpi_assert_exact_length, and mpi_assert_allow_overtaking.

This commit also adds support for the mpi_assert_no_any_source and
mpi_assert_allow_overtaking info keys to the ob1 pml.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-08 15:52:12 -06:00
KAWASHIMA Takahiro
0cbdbe32f7 ompi/request: Support non-PML persistent requests
This commit adds the `req_start` member to the `ompi_request_t` struct.
The `MPI_START` and `MPI_STARTALL` routines call this callback function
instead of `MCA_PML_CALL(start(...))`. So components that return
persistent request must set this member to their request objects.

`mca_pml_base_module_t::pml_start` is not deleted because
`MCA_PML_CALL(start(...))` is still used elsewhere across OMPI.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-06-02 13:08:17 +09:00
Nathan Hjelm
d10e6455a0 osc/sm: fix SEGV in new info usage
This commit moves the info subscribe for the blocking_fence to after
the global_state is allocated and moves setting win->w_osc_module to
before the info subscribe for alloc_shared_contig. This fixes a SEGV
caught by MTT.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-01 12:32:30 -06:00
Gilles Gouaillardet
5e9be7667b Merge pull request #3600 from ggouaillardet/topic/osc_rdma_get_segment
osc/rdma: fix osc_rdma_get_remote_segment() length parameter
2017-06-01 13:09:14 +09:00
Nathan Hjelm
e1a997c0cb Merge pull request #3593 from hjelmn/bug_3575
osc/rdma: fix typo in ompi_osc_rdma_lock_acquire_exclusive
2017-05-31 08:54:40 -06:00
Ralph Castain
ed4078e2dd Protect against the condition where the port string is actually NULL
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-28 20:51:09 -07:00
Gilles Gouaillardet
e622ca8c1c osc/rdma: fix osc_rdma_get_remote_segment() length parameter
a buffer defined by (buf, count, dt)
will have data starting at buf+offset and ending len bytes later with
len = opal_datatype_span(&dt.super, count, &offset);

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-29 11:08:03 +09:00
Ralph Castain
9f60cd0fe7 Update the connect/accept support so we check to see if we have the proper infrastructure and RTE support, including whether we have ompi-server available if the connect/accept spans multiple applications. Print pretty help messages in all cases where we do not have support
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-27 10:47:08 -07:00
Nathan Hjelm
b83c5dbee5 osc/rdma: fix typo in ompi_osc_rdma_lock_acquire_exclusive
Fixes #3575

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-05-26 14:21:08 -06:00
Josh Hursey
4bfb0fcddd Merge pull request #3577 from markalle/pr/osc_rdma_rangecheck
fix for buffer length check (rdma osc w/ odd datatypes)
2017-05-26 10:44:33 -05:00
Nathan Hjelm
7d5cc8ebca Merge pull request #3572 from ggouaillardet/topic/ompi_osc_rdma_rget_accumulate_internal
osc/rdma: fix datatype extent usage in ompi_osc_rdma_rget_accumulate_…
2017-05-26 09:37:51 -06:00
Gilles Gouaillardet
47ebfaa60d Merge pull request #3451 from mkurnosov/reduce-allreduce-rebenseifner
coll: Add Rabenseifner's algorithm for Reduce and Allreduce
2017-05-26 21:00:30 +09:00
Mikhail Kurnosov
f6e2d4ab04 coll: Add Rabenseifner's algorithm for Reduce and Allreduce
A component with implementation of R. Rabenseifner's algorithm for Reduce and Allreduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather or an allgather.

Current limitations:
  -- count >= 2^{\floor{\log_2 p}}
  -- commutative operations only
  -- intra-communicators onl

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>

coll/spacc: Modify implementation to use `ompi_coll_base_sendrecv()`

Replace irecv() + isend() + ompi_request_wait() to ompi_coll_base_sendrecv().

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2017-05-26 14:33:35 +07:00
Gilles Gouaillardet
0f79259b94 osc/rdma: use extent of the appropriate datatype in ompi_osc_rdma_rget_accumulate_internal()
origin_datatype and target_datatype might be different and hence have different extent,
so use either origin_extent or target_extent when appropriate.

Refs open-mpi/ompi#3569

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-26 13:59:38 +09:00
Geoff Paulsen
93078ad824 Merge pull request #3551 from markalle/1sided_some_single
1sided with some hosts single rank -- Fixes #3548
2017-05-25 13:59:48 -05:00
Mark Allen
df14cbf039 fix for buffer length check (rdma osc w/ odd datatypes)
The osc_rdma_get_remote_segment() has the 3rd and 4th args as
* target_disp
* length
which it uses to determine if the rdma falls within the bounds of
the window or not (actually it only checks the upper bound, but I'm
okay with that).

Anyway the caller previously was passing in the length argument as
    target_datatype->super.size * target_count
which which doesn't really represent the number of bytes after target_disp
for which data exists. In particular I could create a datatype as
    { disp -4, len 4 } and use target_disp 4
and that would be bytes 0-3 of the window where the original code
would think it was bytes 4-7 and could abort at the range check.

Ive changed it to use the opal_datatype_span() function.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-24 19:10:39 -04:00
Mark Allen
36f51bca26 yalla with irregular contig datatype -- Fixes 3566
Yalla has a macro PML_YALLA_INIT_MXM_REQ_DATA that checks if a datatype
is contiguous via opal_datatype_is_contiguous_memory_layout(dt,count)
and if so it selects a size and lb that presumably is what will rdma, as
            ompi_datatype_type_size(_dtype, &size); \
            ompi_datatype_type_lb(_dtype, &lb); \

This failed when I gave it a datatype constructed as [ ...] with extent 4.
What I mean by that datatype is
    lens[0] = 3;
    disps[0] = 1;
    types[0] = MPI_CHAR;
    MPI_Type_struct(1, lens, disps, types, &tmpdt);
    MPI_Type_create_resized(tmpdt, 0, 4, &mydt);
So there are 3 chars at offset 1, and the LB is 0 and the UB is 4.

So that macro decides that size=4 and lb=0 and later I suppose size is getting
updated to 3 for the final rdma, and so a send of a buffer
[ 0 1 2 3 ] gets recved as [ 0 1 2 _ ]. I think it should use the true lb
and the true extent.

For "regular" contig datatypes it would be the same, and for the irregular
ones that are still deemed contiguous by that utility function it should
still be the right thing to use.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-23 20:56:12 -04:00
Mark Allen
c9f31a8d39 fix for 1sided with some hosts single rank
See bug report
    https://github.com/open-mpi/ompi/issues/3548

If a 1sided test is launched -host hostA:2,hostB:1 some of the ranks
call allocate_state_single() and others call allocate_state_shared().
These functions were producing different values for module->state_size
but that's used when they lookup peer info from each other in
ompi_osc_rdma_peer_setup() so they need to all have matching
module->state_offset values.

This change adds a few unused bytes in the memory allocate_state_single()
creates so it matches.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-22 15:10:49 -04:00
Geoff Paulsen
50f9287c03 Merge pull request #2941 from markalle/pr/mpi-info-update2
Finally Merging this in.  MPI_*_get_info/set_info().
Targeting v3.1 release.  @hjelmn were you interested in switching some internal pieces to begin using this?  Should we target v3.1 (or whatever we call the Oct 15th release?)
2017-05-22 09:22:04 -05:00
Mark Allen
482d84b6e5 fixes for Dave's get/set info code
The expected sequence of events for processing info during object creation
is that if there's an incoming info arg, it is opal_info_dup()ed into the obj
at obj->s_info first. Then interested components register callbacks for
keys they want to know about using opal_infosubscribe_infosubscribe().

Inside info_subscribe_subscribe() the specified callback() is called with
whatever matching k/v is in the object's info, or with the default. The
return string from the callback goes into the new k/v stored in info, and
the input k/v is saved as __IN_<key>/<val>. It's saved the same way
whether the input came from info or whether it was a default. A null return
from the callback indicates an ignored key/val, and no k/v is stored for
it, but an __IN_<key>/<val> is still kept so we still have access to the
original.

At MPI_*_set_info() time, opal_infosubscribe_change_info() is used. That
function calls the registered callbacks for each item in the provided info.
If the callback returns non-null, the info is updated with that k/v, or if
the callback returns null, that key is deleted from info. An __IN_<key>/<val>
is saved either way, and overwrites any previously saved value.

When MPI_*_get_info() is called, opal_info_dup_mpistandard() is used, which
allows relatively easy changes in interpretation of the standard, by looking
at both the <key>/<val> and __IN_<key>/<val> in info. Right now it does
  1. includes system extras, eg k/v defaults not expliclty set by the user
  2. omits ignored keys
  3. shows input values, not callback modifications, eg not the internal values

Currently the callbacks are doing things like
    return some_condition ? "true" : "false"
that is, returning static strings that are not to be freed. If the return
strings start becoming more dynamic in the future I don't see how unallocated
strings could support that, so I'd propose a change for the future that
the callback()s registered with info_subscribe_subscribe() do a strdup on
their return, and we change the callers of callback() to free the strings
it returns (there are only two callers).

Rough outline of the smaller changes spread over the less central files:
  comm.c
    initialize comm->super.s_info to NULL
    copy into comm->super.s_info in comm creation calls that provide info
    OBJ_RELEASE comm->super.s_info at free time
  comm_init.c
    initialize comm->super.s_info to NULL
  file.c
    copy into file->super.s_info if file creation provides info
    OBJ_RELEASE file->super.s_info at free time
  win.c
    copy into win->super.s_info if win creation provides info
    OBJ_RELEASE win->super.s_info at free time

  comm_get_info.c
  file_get_info.c
  win_get_info.c
    change_info() if there's no info attached (shouldn't happen if callbacks
      are registered)
    copy the info for the user

The other category of change is generally addressing compiler warnings where
ompi_info_t and opal_info_t were being used a little too interchangably. An
ompi_info_t* contains an opal_info_t*, at &(ompi_info->super)

Also this commit updates the copyrights.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-05-17 01:12:49 -04:00
Todd Kordenbrock
27ee862964 mtl-portals4: in rendezvous, reissue PtlGet() if it fails
This commit fixes a race condition in the rendezvous protocol.  The
race occurs because the sender does not wait for the link event on the
send buffer.  Even though this has not been seen in the wild, it is
possible for the receiver to issue the PtlGet() before the ME is
linked which causes a NAK at the receiver.  This commit resolves this
race by reissuing the PtlGet() when a NAK occurs.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-05-15 13:11:13 -05:00
David Solt
50aa143ab6 Major structural changes to data types: .super infosubscriber
ompi_communicator_t, ompi_win_t, ompi_file_t all have a super class of type opal_infosubscriber_t instead of a base/super type of opal_object_t (in previous code comm used c_base, but file used super).  It may be a bit bold to say that being a subscriber of MPI_Info is the foundational piece that ties these three things together, but if you object, then I would prefer to turn infosubscriber into a more general name that encompasses other common features rather than create a different super class.  The key here is that we want to be able to pass comm, win and file objects as if they were opal_infosubscriber_t, so that one routine can heandle all 3 types of objects being passed to it.

MPI_INFO_NULL is still an ompi_predefined_info_t type since an MPI_Info is part of ompi but the internal details of the underlying information concept is part of opal.

An ompi_info_t type still exists for exposure to the user, but it is simply a wrapper for the opal object.

Routines such as ompi_info_dup, etc have all been moved to opal_info_dup and related to the opal directory.

Fortran to C translation tables are only used for MPI_Info that is exposed to the application and are therefore part of the ompi_info_t and not the opal_info_t

The data structure changes are primarily in the following files:

    communicator/communicator.h
    ompi/info/info.h
    ompi/win/win.h
    ompi/file/file.h

The following new files were created:

    opal/util/info.h
    opal/util/info.c
    opal/util/info_subscriber.h
    opal/util/info_subscriber.c

This infosubscriber concept is that communicators, files and windows can have subscribers that subscribe to any changes in the info associated with the comm/file/window.  When xxx_set_info is called, the new info is presented to each subscriber who can modify the info in any way they want.  The new value is presented to the next subscriber and so on until all subscribers have had a chance to modify the value.  Therefore, the order of subscribers can make a difference but we hope that there is generally only one subscriber that cares or modifies any given key/value pair.  The final info is then stored and returned by a call to xxx_get_info.

The new model can be seen in the following files:

    ompi/mpi/c/comm_get_info.c
    ompi/mpi/c/comm_set_info.c
    ompi/mpi/c/file_get_info.c
    ompi/mpi/c/file_set_info.c
    ompi/mpi/c/win_get_info.c
    ompi/mpi/c/win_set_info.c

The current subscribers where changed as follows:

    mca/io/ompio/io_ompio_file_open.c
    mca/io/ompio/io_ompio_module.c
    mca/osc/rmda/osc_rdma_component.c (This one actually subscribes to "no_locks")
    mca/osc/sm/osc_sm_component.c (This one actually subscribes to "blocking_fence" and "alloc_shared_contig")

Signed-off-by: Mark Allen <markalle@us.ibm.com>

Conflicts:
	AUTHORS
	ompi/communicator/comm.c
	ompi/debuggers/ompi_mpihandles_dll.c
	ompi/file/file.c
	ompi/file/file.h
	ompi/info/info.c
	ompi/mca/io/ompio/io_ompio.h
	ompi/mca/io/ompio/io_ompio_file_open.c
	ompi/mca/io/ompio/io_ompio_file_set_view.c
	ompi/mca/osc/pt2pt/osc_pt2pt.h
	ompi/mca/sharedfp/addproc/sharedfp_addproc.h
	ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c
	ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c
	ompi/mpi/c/lookup_name.c
	ompi/mpi/c/publish_name.c
	ompi/mpi/c/unpublish_name.c
	opal/mca/mpool/base/mpool_base_alloc.c
	opal/util/Makefile.am
2017-05-12 14:41:05 -04:00
Matias A Cabral
644641d06f PSM and PSM2 MTLs check on the max message size allowed by API.
OMPI send and receive mesages use size_t for the lenght while PSM and PSM2
psm(2)mq_send/receive use uint32_t. Type size_t is 64 bits in 64 bits arch.
Therefore, this patch adds a sanity check on the lenght of the message
and fails gracefully.

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2017-05-10 12:45:11 -07:00
Gilles Gouaillardet
a66909b8b4 Merge pull request #3488 from ggouaillardet/topic/romio314_ad_nfs
romio314: ad_nfs fixes for large files from upstream mpich
2017-05-09 16:58:02 +09:00
Gilles Gouaillardet
26f44da429 coll/base: fix mca_coll_base_alltoallv_intra_basic_inplace()
correctly handle the case when a MPI task has no data to send/recv

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 15:19:14 +09:00
Gilles Gouaillardet
eaf050cfe1 romio314: adio/ad_nfs: fix buffer overflows in ADIOI_NFS_{Read,Write}Strided
Refs: models/mpich#2338
Refs: models/mpich#2617

Signed-off-by: Rob Latham <robl@mcs.anl.gov>

(back-ported from upstream commit pmodels/mpich@642db57648)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 11:11:12 +09:00
Gilles Gouaillardet
02af10ce6e romio314: update NFS read/write routines for large xfers
When we updated UFS and others we left NFS alone.  HDF group would like
a fix, so here we go.

Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>

(back-ported from upstream commit pmodels/mpich@684df9f4c9)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 11:07:47 +09:00
Jeff Squyres
7185567d50 Merge pull request #3455 from jsquyres/pr/fix-lustre-configure
Lustre configure fixes
2017-05-08 16:49:23 -04:00
Ralph Castain
ef0e0171c9 Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c).
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-08 10:04:50 -07:00
Jeff Squyres
c81bc50198 fs/lustre: remove redundant/dead code
We check for liblustreapi.h in OMPI_CHECK_LUSTRE, so this code was
commented out here.  Might as well fully delete it, since it's
redundant and dead.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-05-05 05:28:33 -07:00
Yossi
f56847542e Merge pull request #3347 from alinask/topic/ucx-sync-send
PML UCX: handle a synchronous send.
2017-04-26 18:02:09 +03:00
Alina Sklarevich
49913c692a PML UCX: unite the code for all the sending modes.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-26 13:17:06 +03:00
Howard Pritchard
462342d148 Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi
common/libfabric: move libfabric to ofi
2017-04-21 07:50:38 -06:00
Howard Pritchard
841192645b common/libfabric: move libfabric to ofi
This PR renames the common library for OFI libfabric from
libfabric to ofi.  There are a number of reasons this
is good to do:

1) its shorter and replaces 9 characters with three for
   function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
   the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
   the term "ofi" internally and also in their configure options
   relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)

There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.

This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-20 13:07:16 -06:00
Gilles Gouaillardet
ded63c5e0c ompi: use ompi_coll_base_sendrecv_actual() whenever possible
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-20 10:01:28 +09:00
Gilles Gouaillardet
52551d96c1 Merge pull request #3285 from ggouaillardet/topic/coll_zerobyte_messages
coll/base: always send/recv zero-byte messages
2017-04-20 09:22:47 +09:00
Gilles Gouaillardet
fa5cd0dbe5 use ptrdiff_t instead of OPAL_PTRDIFF_TYPE
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 13:41:56 +09:00
Yossi
9ebcafd6d6 Merge pull request #3260 from derbeyn/fix_yalla
Fix yalla PML: MPI_Recv does not return MPI_ERR_TRUNCATE upon overflow
2017-04-18 11:37:48 +03:00
Alina Sklarevich
d93b67257b PML UCX: handle a synchronous send.
MCA_PML_BASE_SEND_SYNCHRONOUS

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 18:11:55 +03:00
Alina Sklarevich
eec310c99c PML/UCX/YALLA: Fix the message release call.
Set message to MPI_MESSAGE_NULL.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 14:41:13 +03:00
Nathan Hjelm
12b52b2b2c osc/pt2pt: fix infinite frag allocation loop
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-10 16:30:47 -06:00
Howard Pritchard
f5942ff23c Merge pull request #3304 from hppritcha/topic/de-ortization-of-ompi
de-ORTEfy the ompi tree
2017-04-07 14:14:41 -06:00
Noah Evans
ef29fb13cb de-ORTEfy the ompi tree
The ompi tree should be runtime independent, but over time a few
ORTE depedent definitions and functions have escaped into the ompi
tree. I'm working on my own runtime so I've used this as an opportunity
to get rid of ORTE dependencies in the ompi/ tree. I still need to go
back and change orte to conform to the new world and these changes are
untested, but I can now compile (but not link) without orte so I'm
commiting this changeset.

Signed-off-by: Noah Evans <noah.evans@gmail.com>
2017-04-07 12:35:58 -06:00
Nadia Derbey
f918d88c3e Fix yalla PML: Update previous commit after Yossofe's review
Signed-off-by: Nadia Derbey <Nadia.Derbey@atos.net>
2017-04-06 07:58:26 +02:00
Gilles Gouaillardet
f3581c8259 coll/base: have alltoallv send/recv zero-bytes messages
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 13:44:17 +09:00
Gilles Gouaillardet
5492edd71e coll/base: have ompi_coll_base_sendrecv() send/recv zero-bytes messages
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 13:44:05 +09:00
Nathan Hjelm
1322e5dee8 Merge pull request #3274 from hjelmn/osc_rdma_fix
osc/rdma: fix typo in atomic code
2017-04-04 00:20:42 -06:00
Gilles Gouaillardet
5dfd4ab6ca coll/tuned: remove set-but-not-used variables
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-04 13:18:11 +09:00
Nathan Hjelm
fad0803920 osc/rdma: fix typo in atomic code
Fixes #3267

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-03 15:54:28 -06:00
Nadia Derbey
b6de94e449 Fix yalla PML: MPI_Recv does not return MPI_ERR_TRUNCATE upon overflow
Signed-off-by: Nadia Derbey <Nadia.Derbey@atos.net>
2017-03-30 15:18:31 +02:00
Xin Zhao
ee952fcccd Passing estimated_num_procs to UCX init in PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Nathan Hjelm
c72fb30eb5 osc/pt2pt: fix typo
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2017-03-23 09:00:21 -06:00
Xin Zhao
6a99c60fbd Add multithreading support in PML UCX framework.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-20 19:55:00 +02:00
Jeff Squyres
760db0d5ce osc/pt2pt: fix compiler warning
Remove unused variable.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-16 05:46:11 -07:00
Jeff Squyres
1947280865 topo/treematch: squash some compiler warnings
Only define MIN/MAX if they are not already defined.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-16 05:44:26 -07:00
Nathan Hjelm
37214eda09 Merge pull request #3164 from hjelmn/ob1_pinned
pml/ob1: do not cache leave_pinned
2017-03-14 13:22:18 -06:00
Nathan Hjelm
3e7ef48c13 pml/ob1: do not cache leave_pinned
This commit fixes a bug that disabled both the RDMA pipeline and RDMA
protocols in ob1. ob1 was internally caching the values of
opal_leave_pinned and opal_leave_pinned_pipeline at init time. This is
no longer valid as opal_leave_pinned may be set by any call to a btl's
add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 09:00:40 -06:00
Valentin Petrov
fe069c9570 Fixes the coll_allgather usage bug
One should use the correct module object when calling
      c_coll.coll_allgather. Otherwise there will be a segfault in the
      case, for example, when hcoll is used. In that case
      c_coll.coll_allgather = mca_coll_hcoll_allgather while
      c_coll.coll_gather_module = tuned.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2017-03-14 09:47:39 +02:00
Alex Mikheev
c081239f88
ompi: pml ucx: fix persistant request init CR changes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-08 13:26:29 +02:00
Alex Mikheev
c113c37a7a
ompi: pml ucx: fix persistant request initialization
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-08 10:59:41 +02:00
Nathan Hjelm
0195d15401 osc/pt2pt: flush pending fragments on lock ack
This commit addresses an issue that can occur in cases where a lot of
fragments are outstanding.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-06 13:58:46 -07:00
Edgar Gabriel
607dc2c039 Merge pull request #3103 from edgargabriel/pr/sharedfp-name-collision-fix
sharedfp/lockedfile and sm: fix the namecollision
2017-03-05 14:46:20 -06:00
Edgar Gabriel
2d462b3b80 sharedfp/lockedfile and sm: fix name collision
this fixes the issue reported by Nicolas Joly on the mailing: the sharedfp/lockedfile component does not support right now a scenario where multiple jobs read from the same input file, due to a collision of the filenames utilized for the sharedfp handle. Although not part of the oroginal report, the same occurs for the sharedfp/sm component. Add therefore the jobid to be part of the lockedfilename/sm file name.

use the OMPI_CAST_RTE_NAME macro to determine jobid

Fixes: #3098

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-03-05 11:28:28 -06:00
Artem Polyakov
9448814c40 ompi/pml/ucx: Fix uninitialized UCX request field.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-03-05 03:06:30 +07:00
Edgar Gabriel
d1fed77781 Merge pull request #3094 from edgargabriel/pr/master-lustre-priority
io/ompio: adjust the priority of the OMPIO component on lustre
2017-03-03 09:29:14 -06:00
KAWASHIMA Takahiro
39294caf04 Merge pull request #3086 from kawashima-fj/pr/coll-base-defs
coll: Update `ompi/mca/coll/base/coll_base_functions.h`
2017-03-03 18:53:00 +09:00
Edgar Gabriel
9e19834327 io/ompio: adjust the priority of the OMPIO component on lustre
this commit brings over the behavior from the 2.x series to master, mostly with the fork for the 3.x series in mind.
Also, use strncasecmp instead of two strncmps

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-03-02 12:10:11 -06:00
KAWASHIMA Takahiro
c4ca5e703d coll: Update ompi/mca/coll/base/coll_base_functions.h
- Support MPI-2.2 and MPI-3.0 COLL features.

  * `MPI_REDUCE_SCATTER_BLOCK`
  * neighborhood collective communication
  * nonblocking collective communication

- Add `*_BASE_ARGS` and `*_BASE_ARG_NAMES` for convenience.

- Use parameter names used in the MPI Standard.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-03-02 17:58:02 +09:00
KAWASHIMA Takahiro
96aa0d90c1 pml/bfo: Correct a function name and header filenames
These lines were incorrectly modified in 90f2940.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-03-02 16:02:53 +09:00
Alex Mikheev
152f77df59
ompi: pml ucx: fix datatype packing error in bsend
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-01 16:18:19 +02:00
Yossi Itigin
33471c44ee pml_yalla/mtl_mxm/hcoll: open memory component to activate memory hooks.
Memory hooks are now set-up on demand. pml/yalla, mtl/mxm and
coll/hcoll need the memory hooks, so make sure those are installed.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-03-01 12:12:20 +02:00
Jeff Squyres
d5266aba90 Merge pull request #2955 from jsquyres/pr/hwloc-external-fixes
Fix --with-hwloc=external
2017-02-28 14:57:07 -05:00
Josh Hursey
0006f0d7c5 Merge pull request #2773 from jjhursey/topic/hook-fwk
Add a 'hook' framework
2017-02-28 12:29:50 -06:00
Jeff Squyres
fec519a793 hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h
Per a prior commit, the presence of "hwloc.h" can cause ambiguity when
using --with-hwloc=external (i.e., whether to include
opal/mca/hwloc/hwloc.h or whether to include the system-installed
hwloc.h).

This commit:

1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h.
2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to
   find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc.
3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the
   rest of the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:48:42 -08:00
Jeff Squyres
0cd3b6c235 treematch: do not include <hwloc.h>
Instead, include "opal/mca/hwloc/hwloc.h"

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:45:23 -08:00
Josh Hursey
b1c4e50500 Merge pull request #2934 from jjhursey/topic/coll-comm-restructure
Move coll structure outside of the communicator
2017-02-28 08:45:18 -06:00
Nathan Hjelm
032bcf915a osc/rdma: fix compile warning
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-27 16:26:00 -07:00
George Bosilca
366d64b7e5 Move the collective structure outside the communicator.
As we changed the ABI (forcing a major release), we can limit
the size of the predefined communicators by moving the collective
structure outside the communicator. This might have a minimal,
but unnoticeable, impact on performance. This approach has been
discussed during the January 2017 devel meeting.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-27 11:54:17 -06:00
Joshua Hursey
c10bbfded6 ompi/hook: Add the hook/license framework
* Include a 'demo' component that shows some of the features.
 * Currently has hooks for:
   - MPI_Initialized
     - top, bottom
   - MPI_Init_thread
     - top, bottom
   - MPI_Finalized
     - top, bottom
   - MPI_Init
     - top (pre-opal_init), top (post-opal_init), error, bottom
   - MPI_Finalize
     - top, bottom
 * Other places in ompi can 'register' to hook into any one of these places
   by passing back a component structure filled with function pointers.
 * Add a `MCA_BASE_COMPONENT_FLAG_REQUIRED` flag to the MCA structure that
   is checked by the `hook` framework. If a required, static component has
   been excluded then the `hook` framework will fail to initialize.
   - See note in `opal/mca/mca.h` as to why this is checked in the `hook`
     framework and not in `opal/mca/base/mca_base_component_find.c`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-27 12:05:53 -05:00
Nathan Hjelm
581bff9871 Merge pull request #3034 from hjelmn/osc_rdma_atomic
osc/rdma: make locking code more robust
2017-02-27 08:46:52 -07:00
Nathan Hjelm
4707c7c5e0 osc/rdma: make locking code more robust
Under heavy load the locking code could fail if the underlying btl
module started to return OPAL_ERR_OUT_OF_RESOURCE on atomic
operations. This commit updates the code to gracefully handle btl
errors.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-27 00:01:26 -07:00
Gilles Gouaillardet
af0b5cffb4 asm: rename the AMD64 into X86_64
in this context, AMD64 really means amd64 or em64t, so let's
rename this into X86_64 in order to avoid any confusion

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-27 15:10:50 +09:00
Yossi
fb67c966a8 Merge pull request #2944 from alex-mikheev/topic/pml_ucx_bsend
ompi: pml ucx: add support for the buffered send
2017-02-22 12:21:03 +02:00
Alex Mikheev
b015c8bb48 ompi: pml ucx: add support for the buffered send
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-21 17:19:22 +02:00
Gilles Gouaillardet
4184c01be5 Merge pull request #2393 from bosilca/topic/no_predefined_ddt_refcount
Don't refcount the predefined datatypes.
2017-02-21 09:38:11 +09:00
Todd Kordenbrock
048f757d9f osc-portals4: add support for noncontiguous datatypes
This commit implements onesided operations for noncontiguous
datatypes using two different algorithms.

 * If the result and/or origin datatype is noncontiguous and the
   target datatype is contiguous, then an iovec MD is created for
   the result and origin.  The operation is performed using a
   single Portals4 call (unless it exceeds the max message size).
 * If the target datatype is noncontigous, then an algorithm
   similar to the one in osc-rdma is used to loop over the
   contiguous blocks of each datatype.  The operation is
   performed using multiple Portals4 calls.

This commit ensures that individual operations do not exceed the
max atomic size or the max message size supported by the device.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-02-15 16:17:13 -06:00
Gilles Gouaillardet
cd4537193c osc/sm: fix MPI_Win_allocate_shared() alignment
add padding so the memory allocated by MPI_Win_allocate_shared()
is 64 bytes aligned.

Thanks Joseph Schuchart for the bug report

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-15 13:40:48 +09:00
Josh Hursey
0b273c2561 Merge pull request #2808 from jjhursey/fix/ibm/reduce-local-to-coll
coll: Move reduce_local into the coll framework
2017-02-14 15:54:15 -06:00
Nathan Hjelm
cc4a0fabcf Merge pull request #2727 from hjelmn/osc_rdma
osc/rdma: fix typo in check for MPI_MODE_NOCHECK
2017-02-14 10:50:33 -07:00
Joshua Hursey
78006f93a4 coll: Move reduce_local into the coll framework
* Since we are adding a new function to `mca_coll_base_module_2_1_0_t`
   we need to increase the version of the module structure to `2_2_0`.
 * Add a comment just above the PREDEFINED_COMMUNICATOR_PAD describing
   it's purpose and when it should change. To help future developers
   trying to answer the question noted in the comment.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-14 08:56:07 -06:00
Gilles Gouaillardet
e70a30cca4 coll/libnbc: optimize zero size ialltoall{v,w} with MPI_IN_PLACE
and incidentally avoids malloc(0)

Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#2945

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-13 15:21:28 +09:00
Gilles Gouaillardet
12949547f4 coll/libnbc: fix a2aw_sched_linear() with zero size datatype or zero count
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-13 15:21:28 +09:00
Joshua Hursey
383330a50d coll/basic: Expand check for negative input values
* Negative values are parameter errors for neighborhood collectives
   - Add checks to the mpi/c interface `MPI_PARAM_CHECK`
 * Fix a success check for neighbor_alltoallw with dist_graph

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-08 14:26:32 -06:00