It is not possible to use the patcher based memory hooks without
hooking madvise (MADV_DONTNEED). This commit updates the patcher
memory hooks to always hook madvise. This should be safe with recent
rcache updates.
References #3685. Close when merged into v2.0.x, v2.x, and v3.0.x.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The current VMA cache implementation backing rcache/grdma can run into
a deadlock situation in multi-threaded code when madvise is hooked and
the c library uses locks. In this case we may run into the following
situation:
Thread 1:
...
free () <- Holding libc lock
madvice_hook ()
vma_iteration () <- Blocked waiting for vma lock
Thread 2:
...
vma_insert () <- Holding vma lock
vma_item_new ()
malloc () <- Blocked waiting for libc lock
To fix this problem we chose to remove the madvise () hook but that
fix is causing issue #3685. This commit aims to greatly reduce the
chance that the deadlock will be hit by putting vma items into a free
list. This moves the allocation outside the vma lock. In general there
are a relatively small number of vma items so the default is to
allocate 2048 vma items. This default is configurable but it is likely
the number is too large not too small.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Every modern compiler supports either inline assembly or builtin atomic
operations. Because of this it is time to delete all the code associated
with pre-built atomics.
This commit also clean out the DEC and XLC asm checks. Neither check
does anything and the XLC compiler supports GCC ASM.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Ignore errors caused by remote side having exited when closing CUDA IPC mappings.
openmpi/ompi#3244
Signed-off-by: Sylvain Jeaugey <sjeaugey@nvidia.com>
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
All symbols that need to be accessed from a MCA component must be marked
explicitly as visible using PMIX_EXPORT. This patch allows current trunk
to almost work on OsX. More on the devel mailing list.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
passed to make it all flow thru the opal/pmix "put/get" operations. Update the PMIx code to latest master to pickup some required behaviors.
Remove the no-longer-required get_contact_info and set_contact_info from the RML layer.
Add an MCA param to allow the ofi/rml component to route messages if desired. This is mainly for experimentation at this point as we aren't sure if routing wi
ll be beneficial at large scales. Leave it "off" by default.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
As part of the process for addressing removal of CR/FT related
code from master (and hence from the 3.0.0 release), it was agreed
at the OMPI devel F2F on 7/13/17 that we'd break this in to two
pieces:
1) remove the configure arguments (fewer changes)
2) remove all the CR/FT code, etc. in a subsequent bigger commit
that may not make it in to 3.0.0 in time.
By doing 1), the available configure options would not change
in a subsequent 3.0.x release if we end up not being able to do 2)
before 3.0.0 is released.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
If a user explicitly asks for the "sm" BTL, print a show_help message
saying that the SM BTL is dead, and the user should be using "vader".
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
recent changes that broke native launch on cray
using srun or aprun was also broke native launch
using pmi2.
This commit fixes this problem.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This commit ensures that the pml callback is always made when
sending fragments. This is needed to avoid #3845. Once that is
fixed the #if 0'd code can be restored.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Passed the below set of symbols into a script that added ompi_ to them all.
Note that if processing a symbol named "foo" the script turns
foo into ompi_foo
but doesn't turn
foobar into ompi_foobar
But beyond that the script is blind to C syntax, so it hits strings and
comments etc as well as vars/functions.
coll_base_comm_get_reqs
comm_allgather_pml
comm_allreduce_pml
comm_bcast_pml
fcoll_base_coll_allgather_array
fcoll_base_coll_allgatherv_array
fcoll_base_coll_bcast_array
fcoll_base_coll_gather_array
fcoll_base_coll_gatherv_array
fcoll_base_coll_scatterv_array
fcoll_base_sort_iovec
mpit_big_lock
mpit_init_count
mpit_lock
mpit_unlock
netpatterns_base_err
netpatterns_base_verbose
netpatterns_cleanup_narray_knomial_tree
netpatterns_cleanup_recursive_doubling_tree_node
netpatterns_cleanup_recursive_knomial_allgather_tree_node
netpatterns_cleanup_recursive_knomial_tree_node
netpatterns_init
netpatterns_register_mca_params
netpatterns_setup_multinomial_tree
netpatterns_setup_narray_knomial_tree
netpatterns_setup_narray_tree
netpatterns_setup_narray_tree_contigous_ranks
netpatterns_setup_recursive_doubling_n_tree_node
netpatterns_setup_recursive_doubling_tree_node
netpatterns_setup_recursive_knomial_allgather_tree_node
netpatterns_setup_recursive_knomial_tree_node
pml_v_output_close
pml_v_output_open
intercept_extra_state_t
odls_base_default_wait_local_proc
_event_debug_mode_on
_evthread_cond_fns
_evthread_id_fn
_evthread_lock_debugging_enabled
_evthread_lock_fns
cmd_line_option_t
cmd_line_param_t
crs_base_self_checkpoint_fn
crs_base_self_continue_fn
crs_base_self_restart_fn
event_enable_debug_output
event_global_current_base_
event_module_include
eventops
sync_wait_mt
trigger_user_inc_callback
var_type_names
var_type_sizes
Signed-off-by: Mark Allen <markalle@us.ibm.com>
use opal_info_{get,set}_nolock() instead of opal_info_{get,set}()
since the former can be invoked when the info lock is being held.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Regular files are always write-ready, so non-blocking I/O does not
give any benefits for them.
More than that - if libevent is using "epoll" to track fd events,
epoll_ctl will refuse attempt to add an fd pointing to a regular
file descriptor with EPERM.
This fix checks the object referenced by fd and avoids event_add
using event_active instead.
In the original configuration that uncovered this issue "epoll"
was used in libevent, it was triggering the following warning
message:
"[warn] Epoll ADD(1) on fd 0 failed. Old events were 0; read
change was 1 (add); write change was 0 (none): Operation not
permitted"
And the side effect was accumulation of all output in mpirun
memory and actually writing it only at mpirun exit.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
* `-mca mca_base_verbose file:foo` should create an output file with
the suffix `foo`. But since we free the pointer at the end of this
function then by the time we use it it is pointing to invalid memory.
* This commit fixes that corruption
* This commit also fixes the behavior of `file:` with no suffix.
Makes it the same as `file` without the colon.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Add a monitoring PML, OSC and IO. They track all data exchanges between processes,
with capability to include or exclude collective traffic. The monitoring infrastructure is
driven using MPI_T, and can be tuned of and on any time o any communicators/files/windows.
Documentations and examples have been added, as well as a shared library that can be
used with LD_PRELOAD and that allows the monitoring of any application.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* add ability to querry pml monitorinting results with MPI Tools interface
using performance variables "pml_monitoring_messages_count" and
"pml_monitoring_messages_size"
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Fix a convertion problem and add a comment about the lack of component
retain in the new component infrastructure.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Allow the pvar to be written by invoking the associated callback.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Various fixes for the monitoring.
Allocate all counting arrays in a single allocation
Don't delay the initialization (do it at the first add_proc as we
know the number of processes in MPI_COMM_WORLD)
Add a choice: with or without MPI_T (default).
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Cleanup for the monitoring module.
Fixed few bugs, and reshape the operations to prepare for
global or communicator-based monitoring. Start integrating
support for MPI_T as well as MCA monitoring.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Adding documentation about how to use pml_monitoring component.
Document present the use with and without MPI_T.
May not reflect exactly how it works right now, but should reflects
how it should work in the end.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global variables in pml_monitoring.c.
Change mca_pml_monitoring_flush() signature so we don't need the size and rank parameters.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Improve monitoring support (including integration with MPI_T)
Use mca_pml_monitoring_enable to check status state. Set mca_pml_monitoring_current_filename iif parameter is set
Allow 3 modes for pml_monitoring_enable_output: - 1 : stdout; - 2 : stderr; - 3 : filename
Fix test : 1 for differenciated messages, >1 for not differenciated. Fix output.
Add documentation for pml_monitoring_enable_output parameter. Remove useless parameter in example
Set filename only if using mpi tools
Adding missing parameters for fprintf in monitoring_flush (for output in std's cases)
Fix expected output/results for example header
Fix exemple when using MPI_Tools : a null-pointer can't be passed directly. It needs to be a pointer to a null-pointer
Base whether to output or not on message count, in order to print something if only empty messages are exchanged
Add a new example on how to access performance variables from within the code
Allocate arrays regarding value returned by binding
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add overhead benchmark, with script to use data and create graphs out of the results
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix segfault error at end when not loading pml
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Start create common monitoring module. Factorise version numbering
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix microbenchmarks script
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Improve readability of code
NULL can't be passed as a PVAR parameter value. It must be a pointer to NULL or an empty string.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add osc monitoring component
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add error checking if running out of memory in osc_monitoring
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Resolve brutal segfault when double freeing filename
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Moving to ompi/mca/common the proper parts of the monitoring system
Using common functions instead of pml specific one. Removing pml ones.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add calls to record monitored data from osc. Use common function to translate ranks.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix test_overhead benchmark script distribution
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix linking library with mca/common
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add passive operations in monitoring_test
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix from rank calculation. Add more detailed error messages
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix alignments. Fix common_monitoring_get_world_rank function. Remove useless trailing new lines
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix osc_monitoring mget_message_count function call
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Change common_monitoring function names to respect the naming convention. Move to common_finalize the common parts of finalization. Add some comments.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add monitoring common output system
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add error message when trying to flush to a file, and open fails. Remove erroneous info message when flushing wereas the monitoring is already disabled.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Consistent output file name (with and without MPI_T).
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Always output to a file when flushing at pvar_stop(flush).
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Update the monitoring documentation.
Complete informations from HowTo. Fix a few mistake and typos.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Use the world_rank for printf's.
Fix name generation for output files when using MPI_T. Minor changes in benchmarks starting script
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Clean potential previous runs, but keep the results at the end in order to potentially reprocess the data. Add comments.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add security check for unique initialization for osc monitoring
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Clean the amout of symbols available outside mca/common/monitoring
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Remove use of __sync_* built-ins. Use opal_atomic_* instead.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Allocate the hashtable on common/monitoring component initialization. Define symbols to set the values for error/warning/info verbose output. Use opal_atomic instead of built-in function in osc/monitoring template initialization.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Deleting now useless file : moved to common/monitoring
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add histogram ditribution of message sizes
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add histogram array of 2-based log of message sizes. Use simple call to reset/allocate arrays in common_monitoring.c
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add informations in dumping file. Separate per category (pt2pt/osc/coll (to come)) monitored data
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add coll component for collectives communications monitoring
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix warning messages : use c_name as the magic id is not always defined. Moreover, there was a % missing. Add call to release underlying modules. Add debug info messages. Add warning which may lead to further analysis.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix log10_2 constant initialization. Fix index calculation for histogram array.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add debug info messages to follow more easily initialization steps.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Group all the var/pvar definitions to common_monitoring. Separate initial filename from the current on, to ease its lifetime management. Add verifications to ensure common is initialized once only. Move state variable management to common_monitoring.
monitoring_filter only indicates if filtering is activated.
Fix out of range access in histogram.
List is not used with the struct mca_monitoring_coll_data_t, so heritate only from opal_object_t.
Remove useless dead code.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix invalid memory allocation. Initialize initial_filename to empty string to avoid invalid read in mca_base_var_register.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Don't install the test scripts.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix missing procs in hashtable. Cache coll monitoring data.
* Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to the PML layer.
* Cache monitoring data relative to collectives operations on creation.
* Remove double caching.
* Use same proc name definition for hash table when inserting and
when retrieving.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Use intermediate variable to avoid invalid write while retrieving ranks in hashtable.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add missing release of the last element in flush_all. Add release of the hashtable in finalize.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Use a linked list instead of a hashtable to keep tracks of communicator data. Add release of the structure at finalize time.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Set world_rank from hashtable only if found
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Use predefined symbol from opal system to print int
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Move collective monitoring data to a hashtable. Add pvar to access the monitoring_coll_data. Move functions header to a private file only to be used in ompi/mca/common/monitoring
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error value. Fix releasing of coll_data_t objects. Affect value only if data is found in the hashtable.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add automated check (with MPI_Tools) of monitoring.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix procs list caching in common_monitoring_coll_data_t
* Fix monitoring_coll_data type definition.
* Use size(COMM_WORLD)-1 to determine max number of digits.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add linking to Fortran applications for LD_PRELOAD usage of monitoring_prof
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add PVAR's handles. Clean up code (visibility, add comments...). Start updating the documentation
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix coll operations monitoring. Update check_monitoring accordingly to the added pvar. Fix monitoring array allocation.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Documentation update.
Update and then move the latex and README documentation to a more logical place
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Aggregate monitoring COLL data to the generated matrix. Update documentation accordingly.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix monitoring_prof (bad variable.vector used, and wrong array in PMPI_Gather).
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory footprint of monitoring_prof. Unify OSC related outputs.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add the use of a machine file for overhead benchmark
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Check for out-of-bound write in histogram
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Fix common_monitoring_cache object init for MPI_COMM_WORLD
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add RDMA benchmarks to test_overhead
Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2).
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add computation of average and median of overheads. Add comments and copyrigths to the test_overhead script
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add technical documentation
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Adapt to the new definition of communicators
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Update expected output in test/monitoring/monitoring_test.c
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add dumping histogram in edge case
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add consistency in header inclusion.
Include ompi/mpi/fortran/mpif-h/bindings.h only if needed.
Add sanity check before emptying hashtable.
Fix typos in documentation.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* misc monitoring fixes
* test/monitoring: fix test when weak symbols are not available
* monitoring: fix a typo and add a missing file in Makefile.am
and have monitoring_common.h and monitoring_common_coll.h included in the distro
* test/monitoring: cleanup all tests and make distclean a happy panda
* test/monitoring: use gettimeofday() if clock_gettime() is unavailable
* monitoring: silence misc warnings (#3)
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
* Cleanups.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Changing int64_t to size_t.
Keep the size_t used accross all monitoring components.
Adapt the documentation.
Remove useless MPI_Request and MPI_Status from monitoring_test.c.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add parameter for RMA test case
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Clean the maximum bound computation for proc list dump.
Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0dbe5.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add communicator-specific monitored collective data reset
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* Add monitoring scripts to the 'make dist'
Also install them in the build and the install directories.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Remove the opal_ignore from the RML/OFI component, but disable that component unless the user specifically requests it via the "rml_ofi_desired=1" MCA param. This will let us test compile in various environments without interfering with operations while we continue to debug
Fix an error when computing the number of infos during server init
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
The new info infrastructure introduced an abstration break by
including mpi.h and using MPI_ constants in opal. This commit fixes
the break by changing the constants to their opal equivalents.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Based on an idea from Brian move the libevent trigger update to a later
stage instead of the generic add/del procs. So, we are doing the
increment/decrement when we register the recv handler for an endpoint,
so basically when we create and connect a socket to a peer. The benefit
is that as long as TCP is not used, there should be no impact on the
performance of other BTLs. The drawback is that the first TCP connection
will be slightly slower, but then once we have a peer connected over
TCP things go back to normal.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This commit cleans up code in opal to use OPAL_LIST_FOREACH(_SAFE),
OPAL_LIST_DESTRUCT, and OPAL_LIST_RELEASE.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This now passes the loop test, and so we believe it resolves the random hangs in finalize.
Changes in PMIx master that are included here:
* Fixed a bug in the PMIx_Get logic
* Fixed self-notification procedure
* Made pmix_output functions thread safe
* Fixed a number of thread safety issues
* Updated configury to use 'uname -n' when hostname is unavailable
Work on cleaning up the event handler thread safety problem
Rarely used functions, but protect them anyway
Fix the last part of the intercomm problem
Ensure we don't cover any PMIx calls with the framework-level lock.
Protect against NULL argv comm_spawn
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Hostname and PID are output as a message prefix in many places in
our code. Their printf-formats were either `[%s:%d]` or `[%s:%05d]`.
This commit changes `[%s:%d]` to `[%s:%05d]`. The latter was more
widely used in our code (including OPAL output system and the signal
handler).
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
This commit expands the effect of the MCA parameter `opal_abort_delay`
to the OPAL signal handler. This allows attaching of a debugger on
segmentation fault etc. before quitting the job.
The sleep code is moved to the `opal_delay_abort` function from the
`ompi_mpi_abort` and `oshmem_shmem_abort` functions for code cleanup.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
According to Section 6.4.4.1 of the C, we do not need to prepend a type
to a constant to get the right size. The compiler will infer the type
according to the number of bits in the constant.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Parts of the pmix2x component called the event_* functions directly
instead of the opal_event_* wrappers. This is fine as long as we are
using libevent but becomes a problem with other event libraries.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Start updating the various mappers to the new procedure. Remove the stale lama component as it is now very out-of-date. Bring round_robin and PPR online, and modify the mindist component (but cannot test/debug it).
Remove unneeded test
Fix memory corruption by re-initializing variable to NULL in loop
Resolve the race condition identified by @ggouaillardet by resetting the
mapped flag within the same event where it was set. There is no need to
retain the flag beyond that point as it isn't used again.
Add a new job attribute ORTE_JOB_FULLY_DESCRIBED to indicate that all the job information (including locations and binding) is included in the launch message. Thus, the backend daemons do not need to do any map computation for the job. Use this for the seq, rankfile, and mindist mappers until someone decides to update them.
Note that this will maintain functionality, but means that users of those three mappers will see large launch messages and less performant scaling than those using the other mappers.
Have the mindist module add procs to the job's proc array as it is a fully described module
Protect the hnp-not-in-allocation case
Per path suggested by Gilles - protect the HNP node when it gets added in the absence of any other allocation or hostfile
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Finally Merging this in. MPI_*_get_info/set_info().
Targeting v3.1 release. @hjelmn were you interested in switching some internal pieces to begin using this? Should we target v3.1 (or whatever we call the Oct 15th release?)
The expected sequence of events for processing info during object creation
is that if there's an incoming info arg, it is opal_info_dup()ed into the obj
at obj->s_info first. Then interested components register callbacks for
keys they want to know about using opal_infosubscribe_infosubscribe().
Inside info_subscribe_subscribe() the specified callback() is called with
whatever matching k/v is in the object's info, or with the default. The
return string from the callback goes into the new k/v stored in info, and
the input k/v is saved as __IN_<key>/<val>. It's saved the same way
whether the input came from info or whether it was a default. A null return
from the callback indicates an ignored key/val, and no k/v is stored for
it, but an __IN_<key>/<val> is still kept so we still have access to the
original.
At MPI_*_set_info() time, opal_infosubscribe_change_info() is used. That
function calls the registered callbacks for each item in the provided info.
If the callback returns non-null, the info is updated with that k/v, or if
the callback returns null, that key is deleted from info. An __IN_<key>/<val>
is saved either way, and overwrites any previously saved value.
When MPI_*_get_info() is called, opal_info_dup_mpistandard() is used, which
allows relatively easy changes in interpretation of the standard, by looking
at both the <key>/<val> and __IN_<key>/<val> in info. Right now it does
1. includes system extras, eg k/v defaults not expliclty set by the user
2. omits ignored keys
3. shows input values, not callback modifications, eg not the internal values
Currently the callbacks are doing things like
return some_condition ? "true" : "false"
that is, returning static strings that are not to be freed. If the return
strings start becoming more dynamic in the future I don't see how unallocated
strings could support that, so I'd propose a change for the future that
the callback()s registered with info_subscribe_subscribe() do a strdup on
their return, and we change the callers of callback() to free the strings
it returns (there are only two callers).
Rough outline of the smaller changes spread over the less central files:
comm.c
initialize comm->super.s_info to NULL
copy into comm->super.s_info in comm creation calls that provide info
OBJ_RELEASE comm->super.s_info at free time
comm_init.c
initialize comm->super.s_info to NULL
file.c
copy into file->super.s_info if file creation provides info
OBJ_RELEASE file->super.s_info at free time
win.c
copy into win->super.s_info if win creation provides info
OBJ_RELEASE win->super.s_info at free time
comm_get_info.c
file_get_info.c
win_get_info.c
change_info() if there's no info attached (shouldn't happen if callbacks
are registered)
copy the info for the user
The other category of change is generally addressing compiler warnings where
ompi_info_t and opal_info_t were being used a little too interchangably. An
ompi_info_t* contains an opal_info_t*, at &(ompi_info->super)
Also this commit updates the copyrights.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
ompi_communicator_t, ompi_win_t, ompi_file_t all have a super class of type opal_infosubscriber_t instead of a base/super type of opal_object_t (in previous code comm used c_base, but file used super). It may be a bit bold to say that being a subscriber of MPI_Info is the foundational piece that ties these three things together, but if you object, then I would prefer to turn infosubscriber into a more general name that encompasses other common features rather than create a different super class. The key here is that we want to be able to pass comm, win and file objects as if they were opal_infosubscriber_t, so that one routine can heandle all 3 types of objects being passed to it.
MPI_INFO_NULL is still an ompi_predefined_info_t type since an MPI_Info is part of ompi but the internal details of the underlying information concept is part of opal.
An ompi_info_t type still exists for exposure to the user, but it is simply a wrapper for the opal object.
Routines such as ompi_info_dup, etc have all been moved to opal_info_dup and related to the opal directory.
Fortran to C translation tables are only used for MPI_Info that is exposed to the application and are therefore part of the ompi_info_t and not the opal_info_t
The data structure changes are primarily in the following files:
communicator/communicator.h
ompi/info/info.h
ompi/win/win.h
ompi/file/file.h
The following new files were created:
opal/util/info.h
opal/util/info.c
opal/util/info_subscriber.h
opal/util/info_subscriber.c
This infosubscriber concept is that communicators, files and windows can have subscribers that subscribe to any changes in the info associated with the comm/file/window. When xxx_set_info is called, the new info is presented to each subscriber who can modify the info in any way they want. The new value is presented to the next subscriber and so on until all subscribers have had a chance to modify the value. Therefore, the order of subscribers can make a difference but we hope that there is generally only one subscriber that cares or modifies any given key/value pair. The final info is then stored and returned by a call to xxx_get_info.
The new model can be seen in the following files:
ompi/mpi/c/comm_get_info.c
ompi/mpi/c/comm_set_info.c
ompi/mpi/c/file_get_info.c
ompi/mpi/c/file_set_info.c
ompi/mpi/c/win_get_info.c
ompi/mpi/c/win_set_info.c
The current subscribers where changed as follows:
mca/io/ompio/io_ompio_file_open.c
mca/io/ompio/io_ompio_module.c
mca/osc/rmda/osc_rdma_component.c (This one actually subscribes to "no_locks")
mca/osc/sm/osc_sm_component.c (This one actually subscribes to "blocking_fence" and "alloc_shared_contig")
Signed-off-by: Mark Allen <markalle@us.ibm.com>
Conflicts:
AUTHORS
ompi/communicator/comm.c
ompi/debuggers/ompi_mpihandles_dll.c
ompi/file/file.c
ompi/file/file.h
ompi/info/info.c
ompi/mca/io/ompio/io_ompio.h
ompi/mca/io/ompio/io_ompio_file_open.c
ompi/mca/io/ompio/io_ompio_file_set_view.c
ompi/mca/osc/pt2pt/osc_pt2pt.h
ompi/mca/sharedfp/addproc/sharedfp_addproc.h
ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c
ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c
ompi/mpi/c/lookup_name.c
ompi/mpi/c/publish_name.c
ompi/mpi/c/unpublish_name.c
opal/mca/mpool/base/mpool_base_alloc.c
opal/util/Makefile.am
This change has the side effect of improving the performance of all
atomic data structures (in addition to making the code crrect under a
certain interpretation of the volatile usage).
This commit fixes#3450.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Don't overflow the internal datatype count.
Change the type of the count to be a size_t (it does not alter the total
size of the internal structures, so has no impact on the ABI).
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Optimize the datatype creation.
The internal array of counts of predefined types is now only created
when needed, which is either in a heterogeneous environment, or when
one call get_elements. It saves space and makes the convertor creation a
little faster in some cases.
Rearrange the fields in the datatype description structs.
The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the
static array was only partially created. All predefined types should
have the ptypes array created and initialized.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Fix the boundary computation.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* test/datatype: add test for short unpack on heteregeneous cluster
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Trying to reduce the cost of creating a convertor.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Respect the unpack boundaries.
As Gilles suggested on #2535 the opal_unpack_general_function was
unpacking based on the requested count and not on the amount of packed
data provided.
Fixes#2535.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Set the default send and receive socket buffer size to 0,
which means Open MPI will not try to set a buffer size during
startup.
The default behavior since near day one of the TCP BTL has been
to set the send and receive socket buffer sizes to 128 KiB. A
number that works great on 1 GbE, but not so great on 10 GbE
fabrics of any real size. Modern TCP stacks, particularly on
Linux, have gotten much smarter about buffer sizes and are much
less efficient if a buffer size is set (even if set to something
large).
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The direct modex operation is slow, especially at scale for even modestly-connected applications. Likewise, blocking in MPI_Init while we wait for a full modex to complete takes too long. However, as George pointed out, there is a middle ground here. We could kickoff the modex operation in the background, and then trap any modex_recv's until the modex completes and the data is delivered. For most non-benchmark apps, this may prove to be the best of the available options as they are likely to perform other (non-communicating) setup operations after MPI_Init, and so there is a reasonable chance that the modex will actually be done before the first modex_recv gets called.
Once we get instant-on-enabled hardware, this won't be necessary. Clearly, zero time will always out-perform the time spent doing a modex. However, this provides a decent compromise in the interim.
This PR changes the default settings of a few relevant params to make "background modex" the default behavior:
* pmix_base_async_modex -> defaults to true
* pmix_base_collect_data -> continues to default to true (no change)
* async_mpi_init - defaults to true. Note that the prior code attempted to base the default setting of this value on the setting of pmix_base_async_modex. Unfortunately, the pmix value isn't set prior to setting async_mpi_init, and so that attempt failed to accomplish anything.
The logic in MPI_Init is:
* if async_modex AND collect_data are set, AND we have a non-blocking fence available, then we execute the background modex operation
* if async_modex is set, but collect_data is false, then we simply skip the modex entirely - no fence is performed
* if async_modex is not set, then we block until the fence completes (regardless of collecting data or not)
* if we do NOT have a non-blocking fence (e.g., we are not using PMIx), then we always perform the full blocking modex operation.
* if we do perform the background modex, and the user requested the barrier be performed at the end of MPI_Init, then we check to see if the modex has completed when we reach that point. If it has, then we execute the barrier. However, if the modex has NOT completed, then we block until the modex does complete and skip the extra barrier. So we never perform two barriers in that case.
HTH
Ralph
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
The usNIC BTL does not use more than 1 iov, so be sure to set it to 1
so that we don't allocate cq/rq/sq entries based on a default (i.e.,
>1) number of iovs per entry.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This PR renames the common library for OFI libfabric from
libfabric to ofi. There are a number of reasons this
is good to do:
1) its shorter and replaces 9 characters with three for
function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
the term "ofi" internally and also in their configure options
relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)
There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.
This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This commit recategorizes several mpirun arguments,
and moves the information for mpirun --help arguments
to the bottom of the general help message. I also
added the OPAL_CMD_LINE_OTYPE field to two commands
that were missed initially because they were not
in the same area as the others.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.
Thanks George, Nathan and Paul for the help.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
* Complete rewrite of opal_pointer_array
Instead of a cache oblivious linear search use a bits array
to speed up the management of the free space. As a result we
slightly increase the memory used by the structure, but we get a
significant boost in performance.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Do not register datatypes in the f2c translation table.
The registration is now done up into the Fortran layer, by
forcing a call to MPI_Type_c2f.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
With this, libs (e.g., "-ldl") are not added to the wrapper LIBS
flags. This may work on some platforms, but on at least RHEL 7.3, it
does not (i.e., compiling MPI applications fails because it can't find
dlopen).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
* `ompi_info --show-failed` will include the failed components along
with information about why they failed.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
* Add a path for failed component load information to be reported up.
* This allows ompi_info to display this information inline to make it
easier for folks to see if the component is present but failed for
some reason. Most likely a missing library, but could be a libnl
conflict.
* Add MCA parameter to enable this feature:
- `mca_base_component_track_load_errors` takes a boolean
- Default: `false`
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Adds:
- enabling/disabling of timings throught environment variable `OMPI_TIMING_ENABLE`
- output format: [file name]:[function name]:[description]: avg/min/max
- dynamically extending array of results for case then inited size was exhausted
- catch and collect errors
- cleanup
Note:
For use feature need to configure with `--enable-timings`
and set env `OMPI_TIMING_ENABLE = 1`
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
This commit adds new timing feature that uses environment variables to
expose timing information. This allows easy access to this data (if
timing is enabled) from any other part of the application for the subsequent
postprocessing.
In particular this will be integrated with OMPI-level timing framework that
whill use MPI_Reduce functionality to provide more compact and easy-to use
information.
This commit also adds the example of usage of this framework by annotating
rte_init function. The result is not used anywhere for now. It will be
postprocessed in subsequent commits.
NOTE: that functionality is currently disabled untill it will be verified at runtime
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
This commit adds a "parsable" option to the help
arguments, which prints out a machine readable
list of all the mpirun options.
Fixes#3279
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
The key change was in btl_openib_connect_udcm.c where a buffer was
being pinned with size 65664 (whether openib was being used or not).
The start of the buffer was page aligned, but because of the size
the end wasn't. That makes it too easy for a forked child to accidentally
touch pinned memory on the same page as the end of that buffer.
So this change increases the size of the allocated buffer to use the
rest of the page.
I inspected the rest of the ibv_reg_mr() calls and changed one other
place to page align its buffer too, although I think the above is
the one that really matters.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
This commit modifies the output from the mpirun --help
command. The options have been split into groups, to
make the output smaller and more readable. The groups
are: general, debug, output, input, mapping, ranking,
binding, devel, compatibility, launch, dvm, and
unsupported. There is also a special "full" command
that can be used to get the old behaviour of printing
out all of the options. Unsupported options may only
be seen with this full output.
This commit also adds a special case for the help
argument. It makes it possible for the user to
enter 0 or 1 arguments instead of having to always
enter an argument. This defaults to printing out
the "general" help options so the user can then
see what help arguments there are.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
Set the daemons' state to "running" and mark them as "alive" by default when constructing the nidmap
Get the DVM running again
Fix direct modex by eliminating race condition caused by releasing data while sending it
Up the size limit before compressing
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Commit fec519a793 broke the ability to
run autogen.pl in a distribution tarball. This commit restores that
ability by also distributing opal/mca/hwloc/autogen.options in the
tarball.
Skipping CI because CI does not test this functionality:
[skip ci]
bot:notest
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Several component-specific functions were named with a prefix of
"opal_timer_base", which was quite confusing. Rename them to have a
prefix "opal_timer_linux" to make it clear that they are here in this
component (and different than *actual* opal_timer_base symbols).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit makes datagram checks time based and reduces their
frequency when only the wildcard datagram is posted. This change
improves latency on knl systems.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit removes the local operation count check from the local SMSG
completion queue. This check was leading to hangs due to an undocumented
feature of the ugni library. The local SMSG CQ is used to send credit
return messages back to the sender. The ugni library never checks for
the completion itself but relying on the SMSG user to periodically
check the CQ.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit updates the ugni btl to make use of multiple device
contexts to improve the multi-threaded RMA performance. This commit
contains the following:
- Cleanup the endpoint structure by removing unnecessary field. The
structure now also contains all the fields originally handled by the
common/ugni endpoint.
- Clean up the fragment allocation code to remove the need to
initialize the my_list member of the fragment structure. This
member is not initialized by the free list initializer function.
- Remove the (now unused) common/ugni component. btl/ugni no longer
need the component. common/ugni was originally split out of
btl/ugni to support bcol/ugni. As that component exists there is no
reason to keep this component.
- Create wrappers for the ugni functionality required by
btl/ugni. This was done to ease supporting multiple device
contexts. The wrappers are thread safe and currently use a spin
lock instead of a mutex. This produces better performance when
using multiple threads spread over multiple cores. In the future
this lock may be replaced by another serialization mechanism. The
wrappers are located in a new file: btl_ugni_device.h.
- Remove unnecessary device locking from serial parts of the ugni
btl. This includes the first add-procs and module finalize.
- Clean up fragment wait list code by moving enqueue into common
function.
- Expose the communication domain flags as an MCA variable. The
defaults have been updated to reflect the recommended setting for
knl and haswell.
- Avoid allocating fragments for communication with already
overloaded peers.
- Allocate RDMA endpoints dyncamically. This is needed to support
spreading RMA operations accross multiple contexts.
- Add support for spreading RMA communication over multiple ugni
device contexts. This should greatly improve the threading
performance when communicating with multiple peers. By default the
number of virtual devices depends on 1) whether
opal_using_threads() is set, 2) how many local processes are in the
job, and 3) how many bits are available in the pid. The last is
used to ensure that each CDM is created with a unique id.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit exposes ugni statistics for use with MPI_T. There is
no overhead to providing these counters.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
hwloc v1.5 does not support HWLOC_OBJ_OSDEV_COPROC
nor hwloc_topology_dup(), so for this version :
- do not search for coprocessors
- do not try hwloc_topology_dup(), note this is not
used anywhere in the code base
Thanks Jeff for helping with the wording
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This is mostly for error cases, where we need to release the
newly created proc. Currently the code deadlocks because the endpoint
lock is help at the release and the lock is not recursive.
Aslo added some code to print the IP addresses that don't match during
the TCP connection step.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Per a prior commit, the presence of "hwloc.h" can cause ambiguity when
using --with-hwloc=external (i.e., whether to include
opal/mca/hwloc/hwloc.h or whether to include the system-installed
hwloc.h).
This commit:
1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h.
2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to
find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc.
3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the
rest of the code base.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
* Include a 'demo' component that shows some of the features.
* Currently has hooks for:
- MPI_Initialized
- top, bottom
- MPI_Init_thread
- top, bottom
- MPI_Finalized
- top, bottom
- MPI_Init
- top (pre-opal_init), top (post-opal_init), error, bottom
- MPI_Finalize
- top, bottom
* Other places in ompi can 'register' to hook into any one of these places
by passing back a component structure filled with function pointers.
* Add a `MCA_BASE_COMPONENT_FLAG_REQUIRED` flag to the MCA structure that
is checked by the `hook` framework. If a required, static component has
been excluded then the `hook` framework will fail to initialize.
- See note in `opal/mca/mca.h` as to why this is checked in the `hook`
framework and not in `opal/mca/base/mca_base_component_find.c`
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
in this context, AMD64 really means amd64 or em64t, so let's
rename this into X86_64 in order to avoid any confusion
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Return value in comment about opal_list_item_compare_fn_t typedef when a < b is indicated to be 11 instead of -1.
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
This commit makes the vma tree garbage collection list a lifo. This
way we can avoid having to hold any lock when releasing vmas. In
theory this should finally fix the hold-and-wait deadlock detailed
in #1654.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Allow someone to specify the "pe=N" modifier to a mapping policy when N=1. This equates to just "bind-to core", but helps people who use a script to set the PE policy. Fix a bug where setting the binding policy left a lingering "if-supported" flag that shouldn't be there.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Cleanup a race condition segfault during finalize by ensuring the PMIx progress thread is stopped prior to starting to tear down the messaging components
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Do not use opal_output_verbose inside O(n) loops. This was causing us
to make O(n) calls to snprintf which was greatly slowing launch at
scale.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds a new base enumerator type for variables that take of
the values -1, 0, and 1. These values are mapped to the strings auto,
false, true. This commit updates the mpi_leave_pinned MCA variable to
use the new enumerator.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Add a verbose to show all the failed attempts to match the
remote interfaces based on the modex info.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
It's possible that we can have zlib.h but still not have zlib support.
Use the correct macro to protect the usage of calling zlib functions.
This fixes 32-bit MTT builds at Cisco (e.g.,
https://mtt.open-mpi.org/index.php?do_redir=2389).
Submitted upstream to PMIX: https://github.com/pmix/master/pull/290
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
PMIx_server_register_nspace() is an asynchronous operation, so
the pmix glue wait for it completes before returning.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
the argc field from the opal_pmix_app_t struct was removed,
so adjust the pmix/ext11 glue accordingly.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
- New MCA option: opal_stacktrace_output
- Specifies where the stack trace output stream goes.
- Accepts: none, stdout, stderr, file[:filename]
- Default filename 'stacktrace'
- Filename will be `stacktrace.PID`, or if VPID is available,
then the filename will be `stacktrace.VPID.PID`
- Update util/stacktrace to allow for different output avenues
including files. Previously this was hardcoded to 'stderr'.
- Since opal_backtrace_print needs to be signal safe, passing it a
FILE object that actually represents a file stream is difficult. This
is because we cannot open the file in the signal handler using
`fopen` (not safe), but have to use `open` (safe). Additionally, we
cannot use `fdopen` to convert the `int fd` to a `FILE *fh` since it
is also not signal safe.
- I did not want to break the backtrace.h API so I introduced a new
rule (documented in `backtrace.c`) that if the `FILE *file`
argument is `NULL` then look for the `opal_stacktrace_output_fileno`
variable to tell you which file descriptor to use for output.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Since the oob and connections systems do not work the same way they
did in older versions of Open MPI these operations are no longer
necessary. At best they do nothing and at worst they hurt performance
by making us enter the event library more often in opal_progress().
Fixes#2839
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
When Java bindings are used, MPI_Init() is not invoked
by the main thread, and this causes some keys being destructed twice.
Reset the per thread values to NULL in order to correctly handle this
Fixesopen-mpi/ompi#2811
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
* Similar to `orte_map_stddiag_to_stderr` except it redirects `stddiag`
to `stdout` instead of `stderr`.
* Add protection so that the user canot supply both:
- `orte_map_stddiag_to_stderr`
- `orte_map_stddiag_to_stdout`
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
- This allows the following MCA option to have an impact on the
framework verbose output as well.
* `-mca mca_base_verbose stdout`
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>