1
1

407 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
125d236173 Move from the use of regex to compression
We've been fighting the battle of trying to create a regex generator and
parser that can handle arbitrary hostname schemes - without long-term
success. The worst of it is that there is no way of checking to see if
the computed regex is correct short of parsing it and doing a
character-by-character comparison with the original string. Ugh...there
has to be a better solution.

One option is to investigate using 3rd-party regex libraries as
those are coming from communities whose sole focus is resolving that
problem. However, someone would need to spend the time to investigate
it, and we'd have to find a license-friendly implementation.

Another option is to quit beating our heads against the wall and just
compress the information. It won't be as much of a reduction, but we
also won't keep hitting scenarios where things break. In this case, it
seems that "perfection" is definitely the enemy of "good enough".

This PR implements the compression option while retaining the
possibility of people adding regex-generating components. The
compression code used in ORTE is consolidated into the opal/compress
framework. That framework currently held bzip and gzip components for
use in compressing checkpoint files - since we no longer support C/R, I
have .opal_ignore'd those components.

However, I have left the original framework APIs alone in case someone
ever decides to redo C/R. The APIs of interest here are added to the
framework - specifically, the "compress_block" and "decompress_block"
functions. I then moved the ORTE zlib compression code into a new
component in this framework.

Unfortunately, the framework currently is a single-select one - i.e.,
only one active component at a time. Since I .opal_ignore'd the other
two and made the priority of zlib high, this isn't a problem. However,
if someone wants to re-enable bzip/gzip or add another component, they
might need to transition opal/compress to a multi-select framework.

Included changes:

* Consolidate the compression code into the opal/compress framework

* Move the ORTE zlib compression code into a new opal/compress/zlib
  component

* Ignore the bzip and gzip components in opal/compress framework

* Add a "compress_base_limit" MCA param to set the threshold above which
  we compress data - defaults to 4096 bytes

* Delete stale brucks and rcd components from orte/grpcomm framework

* Delete the orte/regx framework

* Update the launch system to use opal/compress instead of string regex

* Provide a default module if no zlib is available

* Fix some misc multi-node issues

* Properly generate the nidmap in response to a "connection warmup"
  message so the remote daemon knows the children it needs to launch.

* Remove stale references to orte_node_regex

* opal_byte_object_t's are not OPAL objects - properly release allocated
  memory.

* Set the topology

* Currently only handling homogeneous case

* Update the compress framework files to conform

* Consolidate open/close into one "frame" file. Ensure we open/close the
  framework

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-08 11:11:14 -08:00
Jeff Squyres
f96c04244d odls_base_default_fns.c: put the free() in the right place
Fixes CID 1441826.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-12-22 06:40:05 -08:00
Ralph Castain
d728380741 If job is fully described, there will be no ppn string to unpack
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-12-17 16:13:55 -08:00
Ralph Castain
647a760b7e Ensure SIGCHLD is unblocked
Thanks to @hjelmn for debugging it and providing the patch

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit efa8bcc17078c89f1c9d6aabed35c90973a469bf)
2018-10-15 21:03:17 -07:00
Ralph H Castain
fc81d0d519 Replace asprintf with opal_asprintf
Silence the flood of warnings from ORTE

Signed-off-by: Ralph H Castain <rhc@open-mpi.org>
2018-10-06 19:32:37 +00:00
Ralph Castain
cfdd08d309 Remove stale ORTE code
Functionality moved to PMIx

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-02 11:55:36 -07:00
Ralph Castain
7facb3f3e9 Pickup and deploy network-specific envars
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-09-13 09:24:19 -07:00
Ralph Castain
f18954d2d5 Update ORTE to allocate network resources
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-09-13 08:54:39 -07:00
Ralph Castain
bc1d13ffbe Remove the orte_enable_instant_on MCA param
We have adequate protection to ensure that we only utilize the PMIx
features related to "instant on" when they are available, so this param
is no longer required and causes confusion.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-09-10 09:20:26 -07:00
Ralph Castain
795140e590 Make use of "instant-on" feature optional
The PMIx support for "instant on" remains experimental, so disable it by default. Provide an MCA param and corresponding command line option to enable it at runtime.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-17 02:42:00 -07:00
Ralph Castain
fa18ba395d Sync to latest PMIx v3.0rc
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-17 02:41:46 -07:00
Ralph Castain
ea21f7175a Silence warnings and remove unused code
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-16 17:42:48 -07:00
Gilles Gouaillardet
4f1cb4747c odls/base: fix support for PMIx < v2.1
wrap opal_pmix.get() around opal_pmix.legacy_get() to support previous PMIx releases.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-04-17 09:59:47 +09:00
Ralph Castain
7241043809 Modify the internal logic for resolve nodes/peers
The current code path for PMIx_Resolve_peers and PMIx_Resolve_nodes executes a threadshift in the preg components themselves. This is done to ensure thread safety when called from the user level. However, it causes thread-stall when someone attempts to call the regex functions from _inside_ the PMIx code base should the call occur from within an event.

Accordingly, move the threadshift to the client-level functions and make the preg components just execute their algorithms. Create a new pnet/test component to verify that the prge code can be safely accessed - set that component to be selected only when the user directly specifies it. The new component will be used to validate various logical extensions during development, and can then be discarded.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 456ac7f7af3d9ba09888e3c899eb001daaa24aef)
2018-03-02 02:00:31 -08:00
Ralph Castain
0434b615b5 Update ORTE to support PMIx v3
This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on":

* initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch

* IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet.

* Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later.

* Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-02 02:00:31 -08:00
Ralph Castain
b643852d8a Properly terminate the job when executable not found
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-26 12:09:24 -08:00
Ralph Castain
e9cd7fd7e6 Update orte
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:53:43 -08:00
Ralph Castain
75eb56522c Continue resolving add_host behavior
Fix a problem in packing/unpacking job updates. There remains a race condition that causes messages to attempt to be sent to the second new daemon before it is completely ready. Not entirely sure where it is coming from.

Refs #4665

Rebase to master. Reset orte_nidmap_communicated if hosts are added. Check for duplicate hostnames in an add_host command. Turn off tree_spawn for dynamic launch of additional daemons.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-15 08:21:01 -08:00
Ralph Castain
4cd7f3b202 Convert nidmap to regx framework
Handle the need for different regex generator/parsers by moving the
orte/util/nidmap and orte/util/regex code into a new "regx" framework.
Use the original code to complete a "fwd" component, and create a
scaffold for IBM's "reverse" component.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-10 20:28:21 -08:00
Ralph Castain
9a7b0d8d9c
Merge pull request #4586 from rhc54/topic/addhosts
Fix add-host support by including the location for procs of prior jobs when spawning new daemons.
2017-12-12 12:45:57 -08:00
Ralph Castain
4316213805 Fix add-host support by including the location for procs of prior jobs when spawning new daemons.
Thanks to CalugaruVaxile for the report

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-07 14:48:58 -08:00
Ralph Castain
ee2a93cb2e Ensure we don't send a kill signal to pid=0 as that hits ourselves and initiates an infinite loop.
Thanks to Michael Fenn for the report.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-07 10:38:11 -08:00
Gilles Gouaillardet
4a481f66e6 odls/base: fix orte_odls_base_harvest_threads()
Do not try to finalize odls progress threads if they have not been started yet

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-12-04 15:18:04 +09:00
Gilles Gouaillardet
3496897961 odls/base: fix handling of the odls_base_num_threads MCA param
If a number of odls threads is explicitly required, then use
that number no matter what.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-12-04 11:19:25 +09:00
Ralph Castain
335fc96f42 Remove debug
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-29 21:21:35 -08:00
Ralph Castain
8f496b01b7 Try automatically adding local spawn threads to parallelize the fork/exec process to speed up the launch on large SMPs. Harvest the threads after initial spawn to minimize any impact on running jobs.
Change the determination of #spawn threads to be done on basis of #local procs in first job being spawned. Someone can look at an optimization that handles subsequent dynamic spawns that might be larger in size.

Leave the threads running, but blocked, for the life of the daemon, and use them to harvest the local procs as they terminate. This helps short-lived jobs in particular.

Add MCA params to set:
  * max number of spawn threads (default: 4)
  * set a specific number of spawn threads (default: -1, indicating no set number)
  * cutoff - minimum number of local procs before using spawn threads (default: 32)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-29 19:54:00 -08:00
Ralph Castain
7a83fdb9bb Update to hwloc 2.0.0a with shmem support.
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 20:26:22 -07:00
Ralph Castain
b225366012 Bring the ofi/rml component online by completing the wireup protocol for the daemons. Cleanup the current confusion over how connection info gets created and
passed to make it all flow thru the opal/pmix "put/get" operations. Update the PMIx code to latest master to pickup some required behaviors.

Remove the no-longer-required get_contact_info and set_contact_info from the RML layer.

Add an MCA param to allow the ofi/rml component to route messages if desired. This is mainly for experimentation at this point as we aren't sure if routing wi
ll be beneficial at large scales. Leave it "off" by default.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-20 21:01:57 -07:00
Mark Allen
552216f9ba scripted symbol name change (ompi_ prefix)
Passed the below set of symbols into a script that added ompi_ to them all.

Note that if processing a symbol named "foo" the script turns
    foo  into  ompi_foo
but doesn't turn
    foobar  into  ompi_foobar

But beyond that the script is blind to C syntax, so it hits strings and
comments etc as well as vars/functions.

    coll_base_comm_get_reqs
    comm_allgather_pml
    comm_allreduce_pml
    comm_bcast_pml
    fcoll_base_coll_allgather_array
    fcoll_base_coll_allgatherv_array
    fcoll_base_coll_bcast_array
    fcoll_base_coll_gather_array
    fcoll_base_coll_gatherv_array
    fcoll_base_coll_scatterv_array
    fcoll_base_sort_iovec
    mpit_big_lock
    mpit_init_count
    mpit_lock
    mpit_unlock
    netpatterns_base_err
    netpatterns_base_verbose
    netpatterns_cleanup_narray_knomial_tree
    netpatterns_cleanup_recursive_doubling_tree_node
    netpatterns_cleanup_recursive_knomial_allgather_tree_node
    netpatterns_cleanup_recursive_knomial_tree_node
    netpatterns_init
    netpatterns_register_mca_params
    netpatterns_setup_multinomial_tree
    netpatterns_setup_narray_knomial_tree
    netpatterns_setup_narray_tree
    netpatterns_setup_narray_tree_contigous_ranks
    netpatterns_setup_recursive_doubling_n_tree_node
    netpatterns_setup_recursive_doubling_tree_node
    netpatterns_setup_recursive_knomial_allgather_tree_node
    netpatterns_setup_recursive_knomial_tree_node
    pml_v_output_close
    pml_v_output_open
    intercept_extra_state_t
    odls_base_default_wait_local_proc
    _event_debug_mode_on
    _evthread_cond_fns
    _evthread_id_fn
    _evthread_lock_debugging_enabled
    _evthread_lock_fns
    cmd_line_option_t
    cmd_line_param_t
    crs_base_self_checkpoint_fn
    crs_base_self_continue_fn
    crs_base_self_restart_fn
    event_enable_debug_output
    event_global_current_base_
    event_module_include
    eventops
    sync_wait_mt
    trigger_user_inc_callback
    var_type_names
    var_type_sizes

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-07-11 02:13:23 -04:00
Ralph Castain
206aec6083 By default, apply signals to all direct children _and_ any children they might have spawned (so long as they remain in the same process group). Provide an MCA param (odls_base_signal_direct_children_only) to indicate that the signal is to go _only_ to our direct children, and not be delivered to any children spawned by those procs.
Refs https://www.mail-archive.com/users@lists.open-mpi.org/msg31221.html

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-15 12:26:11 -07:00
Ralph Castain
8f09929469 Fix rank-file mapper launch by correctly setting up the remote map from the provided data
Put a simple protection for the case where procs fail while we are trying to deregister handlers

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-15 08:33:29 -07:00
Ralph Castain
93cf3c7203 Update OPAL and ORTE for thread safety
(I swear, if I look this over one more time, I'll puke)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 12:30:57 -07:00
Ralph Castain
9d6b929894 Fix uninitialized variable. Set exit codes for failed launch so we get pretty error messages
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-31 07:38:37 -07:00
Ralph Castain
5d990b557c Reorg ordering so that bare executable names also are found
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-30 15:58:55 -07:00
Ralph Castain
321abfc8c6 Fix cwd and preload-binary options
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-30 14:07:22 -07:00
Ralph Castain
ad108ba44d Fix the DVM
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-30 11:42:42 -07:00
Ralph Castain
657e701c65 Add debug verbosity to the orte data server and pmix pub/lookup functions
Start updating the various mappers to the new procedure. Remove the stale lama component as it is now very out-of-date. Bring round_robin and PPR online, and modify the mindist component (but cannot test/debug it).

Remove unneeded test

Fix memory corruption by re-initializing variable to NULL in loop

Resolve the race condition identified by @ggouaillardet by resetting the
mapped flag within the same event where it was set. There is no need to
retain the flag beyond that point as it isn't used again.

Add a new job attribute ORTE_JOB_FULLY_DESCRIBED to indicate that all the job information (including locations and binding) is included in the launch message. Thus, the backend daemons do not need to do any map computation for the job. Use this for the seq, rankfile, and mindist mappers until someone decides to update them.

Note that this will maintain functionality, but means that users of those three mappers will see large launch messages and less performant scaling than those using the other mappers.

Have the mindist module add procs to the job's proc array as it is a fully described module

Protect the hnp-not-in-allocation case

Per path suggested by Gilles - protect the HNP node when it gets added in the absence of any other allocation or hostfile

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-25 18:41:27 -07:00
Ralph Castain
bb1aaa3286 Use the node index to compare to daemon vpid when identifying procs to bind
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-14 02:37:25 -07:00
Ralph Castain
67156556ce On behalf of Josh, ensure we flag that the child is no longer alive since we are killing it with SIGKILL
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-13 21:07:26 -07:00
Ralph Castain
0500cc1c66 Update the debugger launch code to reflect the new backend mapping method.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-12 13:31:18 -07:00
Ralph Castain
92c996487c Update how we pass the node regex so we pass _all_ nodes, even those without daemons. This allows the backend daemons to form a complete picture of the allocation. Include info on which nodes have daemons on them, and populate that info on the backend as well.
Set the daemons' state to "running" and mark them as "alive" by default when constructing the nidmap

Get the DVM running again

Fix direct modex by eliminating race condition caused by releasing data while sending it

Up the size limit before compressing

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-03 19:25:15 -07:00
Ralph Castain
583dbe954c Silence coverity dead-code warning
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-26 20:36:43 -07:00
Ralph Castain
35f817911e Fix coverity issues
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 08:09:46 -07:00
Ralph Castain
10d401b6ec Merge pull request #3217 from rhc54/topic/wdirs
Resolve a race condition for setting our working directory when fork/exec'ing application procs.
2017-03-21 17:39:54 -07:00
Ralph Castain
f8e1e3bed3 Ensure we properly exit with error if we cannot map the job
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 15:15:32 -07:00
Ralph Castain
75684dc260 Resolve a race condition for setting our working directory when fork/exec'ing application procs. We have to ensure we do it after the fork occurs since we want to use multiple threads in the odls. Otherwise, the different threads are bouncing the entire process around.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 13:54:03 -07:00
Ralph Castain
dc85e7fde7 Provide a little more help on the error messages when an executable isn't found so we have some better idea where we were looking for it. Don't double-report such errors. Ensure the ORTE_ERROR_NAME doesn't get a NULL back for the string name of an error code as that might cause some systems to segfault
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-17 09:54:37 -07:00
Ralph Castain
105fb152e1 Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 08:38:51 -07:00
Ralph Castain
b9f5cab710 Add a minor debug statement
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-12 18:15:44 -07:00
Ralph Castain
70591bf4dc Enable parallel fork/exec of local procs by providing the option of multiple odls progress threads
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 20:48:04 -08:00