1
1
Граф коммитов

4805 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
45e2771162 configure: remove CR/FT related options
As part of the process for addressing removal of CR/FT related
code from master (and hence from the 3.0.0 release), it was agreed
at the OMPI devel F2F on 7/13/17 that we'd break this in to two
pieces:

1) remove the configure arguments (fewer changes)
2) remove all the CR/FT code, etc. in a subsequent bigger commit
    that may not make it in to 3.0.0 in time.

By doing 1), the available configure options would not change
in a subsequent 3.0.x release if we end up not being able to do 2)
before 3.0.0 is released.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-07-17 13:48:59 -06:00
Nathan Hjelm
e5343c16c0 btl/vader: remove debug code that should not be in a release
References #3902. Close when in master, v3.0.x, and v2.x.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-17 11:58:47 -05:00
Gilles Gouaillardet
6e35cfc19a btl/sm: fix misc memory leak
as reported by Coverity with CID 1415105

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-16 13:02:55 +09:00
Jeff Squyres
5cf64e6555 btl/sm: effectively delete the SM BTL
If a user explicitly asks for the "sm" BTL, print a show_help message
saying that the SM BTL is dead, and the user should be using "vader".

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-07-15 09:33:08 -07:00
Artem Polyakov
0929c32cd8 Merge pull request #3893 from karasevb/yoda_spml_remove
Remove Yoda SPML
2017-07-15 08:47:31 -07:00
Gilles Gouaillardet
9124afbeae pmix: do not invoke PMIX_INFO_CREATE() with a zero size
Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#3854

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-14 15:00:05 +09:00
Boris Karasev
77c50efb95 Yoda SPML is removed
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-07-14 08:47:16 +03:00
Artem Polyakov
4d3e22e815 Merge pull request #3870 from hppritcha/topic/repair_s2_launch
pmix/s2: fix srun native launch for pmi2
2017-07-13 12:45:22 -05:00
Howard Pritchard
eeb91bc82b pmix/s2: fix srun native launch for pmi2
recent changes that broke native launch on cray
using srun or aprun was also broke native launch
using pmi2.

This commit fixes this problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-07-12 17:45:52 -06:00
Jeff Squyres
ccf17808b6 Merge pull request #3258 from markalle/pr/symbol_name_pollution
symbol name pollution
2017-07-12 16:19:25 -05:00
Artem Polyakov
832f1b03a4 Merge pull request #3790 from artpol84/orte/iof_sbatch
orte/iof: Address the case when output is a regular file
2017-07-12 09:38:01 -05:00
Gilles Gouaillardet
a111fc8ff2 opal/datatype: fix opal_dt_swap_long_double if no IEEE754_H
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-12 10:27:45 +09:00
Gilles Gouaillardet
8fd08b933a opal/datatype: add minimal support to convert long double
between ieee 754 quadruple precision and extended precision formats.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-12 10:27:45 +09:00
Gilles Gouaillardet
9118777b66 opal/ddt: use optimized description when packing contiguous datatypes
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-12 10:27:45 +09:00
Gilles Gouaillardet
32606ad476 btl/tcp: fix heterogeneous support for put / large messages
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-12 10:27:45 +09:00
Nathan Hjelm
c18007d095 btl/vader: work around ob1 pending fragment bug
This commit ensures that the pml callback is always made when
sending fragments. This is needed to avoid #3845. Once that is
fixed the #if 0'd code can be restored.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-11 15:59:56 -06:00
Howard Pritchard
550e8c4afe Merge pull request #3842 from hppritcha/topic/fix_cray_pmix_problem
pmix/cray: add a bit of debug output
2017-07-11 08:29:56 -06:00
Howard Pritchard
26a8142c97 pmix/cray: add a bit of debug output
add a bit of debug output to help with pmix finalize issues

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-07-11 05:45:49 -05:00
Mark Allen
552216f9ba scripted symbol name change (ompi_ prefix)
Passed the below set of symbols into a script that added ompi_ to them all.

Note that if processing a symbol named "foo" the script turns
    foo  into  ompi_foo
but doesn't turn
    foobar  into  ompi_foobar

But beyond that the script is blind to C syntax, so it hits strings and
comments etc as well as vars/functions.

    coll_base_comm_get_reqs
    comm_allgather_pml
    comm_allreduce_pml
    comm_bcast_pml
    fcoll_base_coll_allgather_array
    fcoll_base_coll_allgatherv_array
    fcoll_base_coll_bcast_array
    fcoll_base_coll_gather_array
    fcoll_base_coll_gatherv_array
    fcoll_base_coll_scatterv_array
    fcoll_base_sort_iovec
    mpit_big_lock
    mpit_init_count
    mpit_lock
    mpit_unlock
    netpatterns_base_err
    netpatterns_base_verbose
    netpatterns_cleanup_narray_knomial_tree
    netpatterns_cleanup_recursive_doubling_tree_node
    netpatterns_cleanup_recursive_knomial_allgather_tree_node
    netpatterns_cleanup_recursive_knomial_tree_node
    netpatterns_init
    netpatterns_register_mca_params
    netpatterns_setup_multinomial_tree
    netpatterns_setup_narray_knomial_tree
    netpatterns_setup_narray_tree
    netpatterns_setup_narray_tree_contigous_ranks
    netpatterns_setup_recursive_doubling_n_tree_node
    netpatterns_setup_recursive_doubling_tree_node
    netpatterns_setup_recursive_knomial_allgather_tree_node
    netpatterns_setup_recursive_knomial_tree_node
    pml_v_output_close
    pml_v_output_open
    intercept_extra_state_t
    odls_base_default_wait_local_proc
    _event_debug_mode_on
    _evthread_cond_fns
    _evthread_id_fn
    _evthread_lock_debugging_enabled
    _evthread_lock_fns
    cmd_line_option_t
    cmd_line_param_t
    crs_base_self_checkpoint_fn
    crs_base_self_continue_fn
    crs_base_self_restart_fn
    event_enable_debug_output
    event_global_current_base_
    event_module_include
    eventops
    sync_wait_mt
    trigger_user_inc_callback
    var_type_names
    var_type_sizes

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-07-11 02:13:23 -04:00
Mark Allen
efc25168cd symbol name pollution: making some vars static
As part of addressing symbol name pollution, I'm switching a few
vars/functions to static.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-07-11 02:13:22 -04:00
Gilles Gouaillardet
ff2dd69533 opal/util: silence warning in opal_info_dup_mode()
as reported by coverity with CID 1414729

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-11 14:40:37 +09:00
Gilles Gouaillardet
85ff3ebad1 opal: fix return status of opal_info_set()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-11 13:58:15 +09:00
Gilles Gouaillardet
92441accc9 opal/info: fix recursive deadlock in opal_info_dup_mode()
use opal_info_{get,set}_nolock() instead of opal_info_{get,set}()
since the former can be invoked when the info lock is being held.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-10 14:51:46 +09:00
Ralph Castain
a190b4b89f Prefix the MB macro in one more place
Fixes #3830

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-07 06:07:47 -07:00
Ralph Castain
2a580fa71e Merge pull request #3801 from rhc54/topic/hetero
Detect that we have a mix of BE/LE in the system
2017-07-06 15:29:06 -07:00
Ralph Castain
ed43492867 Not really necessary, but technically correct
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-06 06:00:03 -07:00
Ralph Castain
31130a4bee Replace syntax with something less strictly C99
Fixes #3809

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-05 16:54:36 -07:00
Ralph Castain
2753f53e6d Detect that we have a mix of BE/LE in the system, provide a warning that OMPI doesn't currently support this environment, and error out
Fixes #2817

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-03 15:47:05 -07:00
Howard Pritchard
1f2f3db553 pmix/cray: fix handling of multiple finis
The fini code for cray pmix wasn't correct.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-07-03 14:30:34 -05:00
Artem Polyakov
d9ad918a14 orte/iof: Address the case when output is a regular file
Regular files are always write-ready, so non-blocking I/O does not
give any benefits for them.
More than that - if libevent is using "epoll" to track fd events,
epoll_ctl will refuse attempt to add an fd pointing to a regular
file descriptor with EPERM.
This fix checks the object referenced by fd and avoids event_add
using event_active instead.

In the original configuration that uncovered this issue "epoll"
was used in libevent, it was triggering the following warning
message:
"[warn] Epoll ADD(1) on fd 0 failed.  Old events were 0; read
change was 1 (add); write change was 0 (none): Operation not
permitted"
And the side effect was accumulation of all output in mpirun
memory and actually writing it only at mpirun exit.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-07-01 02:24:14 +07:00
Ralph Castain
9178219e6b Deregister event handlers only on final call to finalize. Ensure we pass PMIx mca params
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-28 15:00:43 -07:00
Ralph Castain
d619de4f4c Fix a threadlock when notifying clients of failures
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-28 08:58:41 -07:00
Ralph Castain
e6c2a8d346 Track PMIx v2.0.1
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-26 09:34:57 -07:00
bosilca
d55b666834 Topic/monitoring (#3109)
Add a monitoring PML, OSC and IO. They track all data exchanges between processes,
with capability to include or exclude collective traffic. The monitoring infrastructure is
driven using MPI_T, and can be tuned of and on any time o any communicators/files/windows.
Documentations and examples have been added, as well as a shared library that can be
used with LD_PRELOAD and that allows the monitoring of any application.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>


* add ability to querry pml monitorinting results with MPI Tools interface
using performance variables "pml_monitoring_messages_count" and
"pml_monitoring_messages_size"

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Fix a convertion problem and add a comment about the lack of component
retain in the new component infrastructure.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Allow the pvar to be written by invoking the associated callback.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Various fixes for the monitoring.
Allocate all counting arrays in a single allocation
Don't delay the initialization (do it at the first add_proc as we
know the number of processes in MPI_COMM_WORLD)

Add a choice: with or without MPI_T (default).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Cleanup for the monitoring module.
Fixed few bugs, and reshape the operations to prepare for
global or communicator-based monitoring. Start integrating
support for MPI_T as well as MCA monitoring.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Adding documentation about how to use pml_monitoring component.

Document present the use with and without MPI_T.
May not reflect exactly how it works right now, but should reflects
how it should work in the end.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global variables in pml_monitoring.c.
Change mca_pml_monitoring_flush() signature so we don't need the size and rank parameters.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Improve monitoring support (including integration with MPI_T)

Use mca_pml_monitoring_enable to check status state. Set mca_pml_monitoring_current_filename iif parameter is set
Allow 3 modes for pml_monitoring_enable_output: - 1 : stdout; - 2 : stderr; - 3 : filename
Fix test : 1 for differenciated messages, >1 for not differenciated. Fix output.
Add documentation for pml_monitoring_enable_output parameter. Remove useless parameter in example
Set filename only if using mpi tools
Adding missing parameters for fprintf in monitoring_flush (for output in std's cases)
Fix expected output/results for example header
Fix exemple when using MPI_Tools : a null-pointer can't be passed directly. It needs to be a pointer to a null-pointer
Base whether to output or not on message count, in order to print something if only empty messages are exchanged
Add a new example on how to access performance variables from within the code
Allocate arrays regarding value returned by binding

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add overhead benchmark, with script to use data and create graphs out of the results
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix segfault error at end when not loading pml
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Start create common monitoring module. Factorise version numbering
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix microbenchmarks script
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Improve readability of code

NULL can't be passed as a PVAR parameter value. It must be a pointer to NULL or an empty string.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add osc monitoring component

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add error checking if running out of memory in osc_monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Resolve brutal segfault when double freeing filename
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Moving to ompi/mca/common the proper parts of the monitoring system
Using common functions instead of pml specific one. Removing pml ones.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add calls to record monitored data from osc. Use common function to translate ranks.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix test_overhead benchmark script distribution

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix linking library with mca/common

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add passive operations in monitoring_test

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix from rank calculation. Add more detailed error messages

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix alignments. Fix common_monitoring_get_world_rank function. Remove useless trailing new lines

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix osc_monitoring mget_message_count function call

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Change common_monitoring function names to respect the naming convention. Move to common_finalize the common parts of finalization. Add some comments.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add monitoring common output system

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add error message when trying to flush to a file, and open fails. Remove erroneous info message when flushing wereas the monitoring is already disabled.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Consistent output file name (with and without MPI_T).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Always output to a file when flushing at pvar_stop(flush).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Update the monitoring documentation.
Complete informations from HowTo. Fix a few mistake and typos.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use the world_rank for printf's.
Fix name generation for output files when using MPI_T. Minor changes in benchmarks starting script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean potential previous runs, but keep the results at the end in order to potentially reprocess the data. Add comments.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add security check for unique initialization for osc monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean the amout of symbols available outside mca/common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Remove use of __sync_* built-ins. Use opal_atomic_* instead.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Allocate the hashtable on common/monitoring component initialization. Define symbols to set the values for error/warning/info verbose output. Use opal_atomic instead of built-in function in osc/monitoring template initialization.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Deleting now useless file : moved to common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add histogram ditribution of message sizes

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add histogram array of 2-based log of message sizes. Use simple call to reset/allocate arrays in common_monitoring.c

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add informations in dumping file. Separate per category (pt2pt/osc/coll (to come)) monitored data

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add coll component for collectives communications monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix warning messages : use c_name as the magic id is not always defined. Moreover, there was a % missing. Add call to release underlying modules. Add debug info messages. Add warning which may lead to further analysis.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix log10_2 constant initialization. Fix index calculation for histogram array.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add debug info messages to follow more easily initialization steps.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Group all the var/pvar definitions to common_monitoring. Separate initial filename from the current on, to ease its lifetime management. Add verifications to ensure common is initialized once only. Move state variable management to common_monitoring.
monitoring_filter only indicates if filtering is activated.
Fix out of range access in histogram.
List is not used with the struct mca_monitoring_coll_data_t, so heritate only from opal_object_t.
Remove useless dead code.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix invalid memory allocation. Initialize initial_filename to empty string to avoid invalid read in mca_base_var_register.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Don't install the test scripts.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix missing procs in hashtable. Cache coll monitoring data.
    * Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to the PML layer.
    * Cache monitoring data relative to collectives operations on creation.
    * Remove double caching.
    * Use same proc name definition for hash table when inserting and
      when retrieving.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use intermediate variable to avoid invalid write while retrieving ranks in hashtable.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add missing release of the last element in flush_all. Add release of the hashtable in finalize.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use a linked list instead of a hashtable to keep tracks of communicator data. Add release of the structure at finalize time.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Set world_rank from hashtable only if found

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use predefined symbol from opal system to print int

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Move collective monitoring data to a hashtable. Add pvar to access the monitoring_coll_data. Move functions header to a private file only to be used in ompi/mca/common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error value. Fix releasing of coll_data_t objects. Affect value only if data is found in the hashtable.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add automated check (with MPI_Tools) of monitoring.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix procs list caching in common_monitoring_coll_data_t

    * Fix monitoring_coll_data type definition.
    * Use size(COMM_WORLD)-1 to determine max number of digits.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add linking to Fortran applications for LD_PRELOAD usage of monitoring_prof

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add PVAR's handles. Clean up code (visibility, add comments...). Start updating the documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix coll operations monitoring. Update check_monitoring accordingly to the added pvar. Fix monitoring array allocation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Documentation update.
Update and then move the latex and README documentation to a more logical place

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Aggregate monitoring COLL data to the generated matrix. Update documentation accordingly.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix monitoring_prof (bad variable.vector used, and wrong array in PMPI_Gather).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory footprint of monitoring_prof. Unify OSC related outputs.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add the use of a machine file for overhead benchmark

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Check for out-of-bound write in histogram

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix common_monitoring_cache object init for MPI_COMM_WORLD

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add RDMA benchmarks to test_overhead
Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add computation of average and median of overheads. Add comments and copyrigths to the test_overhead script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add technical documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Adapt to the new definition of communicators

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Update expected output in test/monitoring/monitoring_test.c

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add dumping histogram in edge case

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add consistency in header inclusion.
Include ompi/mpi/fortran/mpif-h/bindings.h only if needed.
Add sanity check before emptying hashtable.
Fix typos in documentation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* misc monitoring fixes

* test/monitoring: fix test when weak symbols are not available
* monitoring: fix a typo and add a missing file in Makefile.am
and have monitoring_common.h and monitoring_common_coll.h included in the distro
* test/monitoring: cleanup all tests and make distclean a happy panda
* test/monitoring: use gettimeofday() if clock_gettime() is unavailable
* monitoring: silence misc warnings (#3)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

* Cleanups.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Changing int64_t to size_t.
Keep the size_t used accross all monitoring components.
Adapt the documentation.
Remove useless MPI_Request and MPI_Status from monitoring_test.c.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add parameter for RMA test case

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean the maximum bound computation for proc list dump.
Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0dbe5.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add communicator-specific monitored collective data reset

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add monitoring scripts to the 'make dist'
Also install them in the build and the install directories.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-26 18:21:39 +02:00
Ralph Castain
79fd359848 Merge pull request #3713 from rhc54/topic/ofi
Enable use of OFI fabrics for launch and other collective operations.…
2017-06-25 11:47:40 -07:00
Ralph Castain
ed85512a7c Update to track PMIx v2.0.1
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-25 07:29:32 -07:00
Ralph Castain
ef56c7d47a Correctly transfer size_t data fields
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-24 20:11:54 -07:00
Ralph Castain
f4411c4393 Enable use of OFI fabrics for launch and other collective operations. Update the PMIx repo to the latest master to get the required support for the server to "push" modex info, and to retrieve all its own "modex" values for sending back to mpirun. Have mpirun cache them in its local modex hash as OFI goes point-to-point direct and doesn't route - so the remote daemons don't need a copy of this connection info.
Remove the opal_ignore from the RML/OFI component, but disable that component unless the user specifically requests it via the "rml_ofi_desired=1" MCA param. This will let us test compile in various environments without interfering with operations while we continue to debug

Fix an error when computing the number of infos during server init

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-23 19:57:21 -07:00
Ralph Castain
8263efff65 Fix uninitialized variables
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-23 11:12:26 -07:00
Ralph Castain
ecacde0cd5 Purge whitespace errors
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-23 11:12:14 -07:00
Nathan Hjelm
b638612a98 Merge pull request #3744 from hjelmn/ompi_cov
opal: fix coverity issues
2017-06-23 09:09:42 -06:00
Nathan Hjelm
db973437e1 opal: fix coverity issues
Fixes coverity CIDs 1412984, and 1412983.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-23 08:15:34 -06:00
Nathan Hjelm
9c621ad5a4 opal/info: fix abstraction break
The new info infrastructure introduced an abstration break by
including mpi.h and using MPI_ constants in opal. This commit fixes
the break by changing the constants to their opal equivalents.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-23 08:03:01 -06:00
Ralph Castain
6ec2ad5288 Fix the pmix_query API when it asks for something that returns an array of pmix_info_t. Protect the PMIX_INFO_FREE macro from NULL arrays. Update the mpi_memprobe scaling test
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-22 20:11:36 -07:00
Nathan Hjelm
4252258338 Merge pull request #3721 from hjelmn/list_cleanup
opal: use opal_list_t convienience macros
2017-06-22 09:12:23 -06:00
Ralph Castain
3e78f84093 Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-21 13:19:51 -07:00
Ralph Castain
cba127bc43 Update the ext2x component to match the internal one
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-20 11:42:14 -07:00
Nathan Hjelm
ffd8ee2dfd opal: use opal_list_t convienience macros
This commit cleans up code in opal to use OPAL_LIST_FOREACH(_SAFE),
OPAL_LIST_DESTRUCT, and OPAL_LIST_RELEASE.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-06-20 12:37:12 -06:00
Ralph Castain
814e858082 Merge pull request #3696 from rhc54/topic/pmix3
Update PMIx and integration glue
2017-06-20 10:38:29 -07:00
Ralph Castain
952726c121 Update to latest PMIx master - equivalent to 2.0rc2. Update the thread support in the opal/pmix framework to protect the framework-level structures.
This now passes the loop test, and so we believe it resolves the random hangs in finalize.

Changes in PMIx master that are included here:

* Fixed a bug in the PMIx_Get logic
* Fixed self-notification procedure
* Made pmix_output functions thread safe
* Fixed a number of thread safety issues
* Updated configury to use 'uname -n' when hostname is unavailable

Work on cleaning up the event handler thread safety problem
Rarely used functions, but protect them anyway
Fix the last part of the intercomm problem
Ensure we don't cover any PMIx calls with the framework-level lock.
Protect against NULL argv comm_spawn

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-20 09:02:15 -07:00