Continue the reorganization of the configure system. Move files from the main config directory to their appropriate level-specific config directories. Modify the configure system to correctly handle compiler detection, test, and setup so that all things pertaining to opal and orte are done at the lower level, with the ompi configure system only looking at mpi-specific options.
Ensure the wrapper compilers for orte and ompi only get built when appropriate. Add support for c++ to the orte wrapper compilers, both script and non-script versions.
This commit was SVN r22138.
This commit looks larger than it really is since it includes a fair amount of code cleanup.
The SIGSTOP/SIGCONT+checkpointing work uses some of the functionality in r20391. Basic use case below (note that the checkpoint generated is useable as usual if the stopped application is terminated).
{{{
shell 1) mpirun -np 2 -am ft-enable-cr my-app
... running ...
shell 2) ompi-checkpoint --stop -v MPIRUN_PID
[localhost:001300] [ 0.00 / 0.20] Requested - ...
[localhost:001300] [ 0.00 / 0.20] Pending - ...
[localhost:001300] [ 0.01 / 0.21] Running - ...
[localhost:001300] [ 1.01 / 1.22] Stopped - ompi_global_snapshot_1234.ckpt
Snapshot Ref.: 0 ompi_global_snapshot_1234.ckpt
shell 2) killall -CONT mpirun
... Application Continues execution in shell 1 ...
}}}
Other items in this commit are mostly cleanup that has been sitting off-trunk for too long:
* Add a new {{{opal_crs_base_ckpt_options_t}}} type that encapsulates the various options that could be passed to the CRS. Currently only TERM and STOP, but this makes adding others ''much'' easier.
* Eliminate ORTE_SNAPC_CKPT_STATE_PENDING_TERM, since it served a redundant purpose with the new options type.
* Lay some basic ground work for some future features.
This commit was SVN r21995.
The following SVN revision numbers were found above:
r20391 --> open-mpi/ompi@0704b98668
1. default -npersocket to force -bind-to-socket
2. if we cannot get a value for cores/socket, try using #logical cpus. otherwise, default to 1 core
3. add missing error message for not-enough-processors
4. since we no longer loop through orte_register_params twice, put the auto-detect of
topology info in the rte_init for hnp and std_orted
5. fix bind-to-core, bysocket combination
This commit was SVN r21992.
Add ability to store the RM's jobid string to tag the notifier message so that the sys admin knows what job had the problem.
This commit was SVN r21687.
Remove all architecture references from ORTE and put them back in the modex using modex_send/recv calls.
Hetero operations are now fully supported again. Comm_spawn now works up to the point where it segfaults due to an error in the CID code - which now allows Edgar to dig further! :-)
This commit was SVN r21655.
that now the HNP send the messages using the routed component. In the case
of tree spawn, when a intermediary node spawn a child it doesn't know how
to forward a message to it, so when the node-map message is coming from
the HNP (as there is nothing yet in the contact/routing table) the message
is sent back the way it came. As a result the node-map message keeps jumping
between the HNP and the first level orteds.
The solution is to add a new option to the children orte_parent_uri, which
is only set when the orted is _not_ directly spawned by the HNP. When this
option is present on the argument list, the orted will add the parent to
its routing, and force the parent to update his routes (by sending the URI).
With this approach, the routing tree is build in same time as the processes
are spawned, and all messages from the HNP can be routed to the leaves.
However, this is far from an optimal solution. Right now, this so called tree
spawn, only spawn the children in a tree without doing anything about the
"connect back to the HNP" step. The HNP is flooded with reports from all the
orted. The total number of messages is higher than in the non tree startup
scheme, so we do not expect this approach to be scalable in the current
incarnation. A complete overhaul of the tree startup is required in order
improve the scalability. Stay tuned!
This commit was SVN r21504.
The open/select of the PLM is done in orte/mca/ess/base/ess_base_std_orted.c. It only is done when the PLM MCA param is set directing a specific PLM be selected. The function
orte_plm_base_orted_append_basic_args
clears the params passed to the daemon of any PLM selection passed to the HNP. Each PLM then adds a PLM directive if-and-only-if backend PLM support is desired. At present, Torque, SLURM, and rsh all specify this support and direct that the backend orted open the "rsh" PLM.
This commit was SVN r21488.
The following SVN revision numbers were found above:
r21480 --> open-mpi/ompi@ed585bce8a
If we do not initialize the PML, non-HNP daemons will not be able to use its functions. For example, RSH needs it when the tree_spawn mode is
enabled: daemons call orte_pml.remote_spawn() function to spawn their children in the deployment tree.
This commit was SVN r21480.
Add a new tm ess module that exploits this capability.
Update the various plm modules to enable it - just a minor change reflecting an added param to a plm base function.
Additional fixes included:
1. remove an erroneous cleanup of session directories in the tool finalize procedure - tools don't create session directories to begin with!
2. fix a duplicate free when attempting to execute a non-existent app
3. cleanup an typo in the comm utilities
4. fix comm_spawn - was perturbed by the changes in pack/unpack of orte_job_t to properly support orte-ps
Been tested on slurm and tm machines, using all tests in orte/test/mpi. May run into issue with command line length on large jobs due to inclusion of node info to support static ports - will fix this next with addition of regexp generator to compress that info.
This commit was SVN r21248.
This causes the orteds in the routing tree to remain alive until all termination "acks" from orteds below them have passed through. Thus, if we use static ports, we no longer require a direct orted-to-mpirun connection.
Also modify the binomial routed module so it conforms to what all the other routed modules do and have all messages pass along the routing tree instead of short-circuiting between orteds. This further reduces the number of ports being opened on backend nodes.
This commit was SVN r21203.
- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
several header files (previously included by header-files)
now have to be moved "upward".
This is mainly system headers such as string.h, stdio.h and for
networking, but also some orte headers.
This commit was SVN r21095.
In case we use memcmp, strlen, strup and friends include <string.h>
Also several constants.h are not included directly
- Let's have mca_topo_base_cart_create return ompi-errors in
ompi/mca/topo/base/topo_base_cart_create.c
This commit was SVN r20773.
Adapt orte_process_info to orte_proc_info, and
change orte_proc_info() to orte_proc_info_init().
- Compiled on linux-x86-64
- Discussed with Ralph
This commit was SVN r20739.
Correct an error wrt how jobids were being computed. Needed to ensure that the job family field was not overrun as we increment jobids for comm_spawn.
Update the slurm plm module so it uses the new slurm termination procedure (brings trunk back into alignment with 1.3 branch).
Update the slurmd ess component so it doesn't get selected if we are running a singleton inside of a slurm allocation.
Cleanup HNP init by moving some code that had been in orte_globals.c for historical reasons into the ess hnp module, and removing the call to that code from the ess_base_std_prolog
NOTE: this change allows orte to support an infinite aggregate number of comm_spawn's, with up to 64k being alive at any one instant. HOWEVER, the MPI layer currently does -not- support re-use of jobids. I did some prototype coding to revise the ompi_proc_t structures, but the BTLs are caching their own data, and there was no readily apparent way to update it. Thus, attempts to spawn more than the 64k limit will abort to avoid causing the MPI layer to hang.
This commit was SVN r20700.
Only proc_info.h-internal include file is opal/dss/dss_types.h
- In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
works fine, no errors.
Again, let's have MTT the last word.
This commit was SVN r20631.
Note: this requires that nodes not be shared by jobs/users. SLURM developers are working on an enhancement to remove this constraint.
Note 2: yes, the direct routed module returned! However, it is vastly different than the old one and has zero support for such things as comm_spawn. It is solely to support non-daemon, direct-launch environments.
This commit was SVN r20601.
Often, orte/util/show_help.h is included, although no functionality
is required -- instead, most often opal_output.h, or
orte/mca/rml/rml_types.h
Please see orte_show_help_replacement.sh commited next.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
actually showed two *missing* #include "orte/util/show_help.h"
in orte/mca/odls/base/odls_base_default_fns.c and
in orte/tools/orte-top/orte-top.c
Manually added these.
Let's have MTT the last word.
This commit was SVN r20557.
The prior ompi_proc_t structure had a uint8_t flag field in it, where only one
bit was used to flag that a proc was "local". In that context, "local" was
constrained to mean "local to this node".
This commit provides a greater degree of granularity on the term "local", to include tests
to see if the proc is on the same socket, PC board, node, switch, CU (computing
unit), and cluster.
Add #define's to designate which bits stand for which local condition. This
was added to the OPAL layer to avoid conflicting with the proposed movement of
the BTLs. To make it easier to use, a set of macros have been defined - e.g.,
OPAL_PROC_ON_LOCAL_SOCKET - that test the specific bit. These can be used in
the code base to clearly indicate which sense of locality is being considered.
All locations in the code base that looked at the current proc_t field have
been changed to use the new macros.
Also modify the orte_ess modules so that each returns a uint8_t (to match the
ompi_proc_t field) that contains a complete description of the locality of this
proc. Obviously, not all environments will be capable of providing such detailed
info. Thus, getting a "false" from a test for "on_local_socket" may simply
indicate a lack of knowledge.
This commit was SVN r20496.
y combination of comma-separated values and ranges. Daemons will use the first port in the range, MPI procs will use the other ports in the range assuming that they know their node rank in time and enough ports were specified.
NOTE: this capability only works under specific conditions. I will outline more about this in a note to devel as the remainder of the implementation progresses. For now, the only environment where this works is slurm. The linear routed module has also been adjusted to work with static ports so that all messaging flows strictly through the topology, including the initial daemon callback - thus limiting the number of sockets opened by mpirun.
This commit was SVN r20390.
Consolidate the nid/pid lookup code with the rest of the nid/pid code so that changes are easier to track. Add the ability to send cluster profile info as part of the nidmap. Cleanup the setup and teardown of the new global nidmap and pidmap objects.
This commit was SVN r20219.
Also, per chat with Jeff, modified the Makefile.am's of a few orte tools so that they were consistent in the way we generate the ompi-equivalent cmds.
This commit was SVN r20165.
This seems to fix the segv seen on process restart.
This commit was SVN r20051.
The following SVN revision numbers were found above:
r20022 --> open-mpi/ompi@9a57db4a81
This commit does that by ensuring that daemons retain knowledge of proc location for all procs in their job family. It required a minor change to the ESS API to allow the daemons to update their pidmaps as data was received. In addition, the routed modules have been updated to take advantage of the newly available info, and the encode/decode pidmap utilities have been updated to communicate the required info in the launch message.
This commit was SVN r20022.
It does not look like this effects the v1.3 branch since r19866 has not moved to the release branch.
Thanks to Leonardo Fialho for reporting this and supplying a patch.
This commit was SVN r19961.
The following SVN revision numbers were found above:
r19866 --> open-mpi/ompi@f54fda489e
1. remove direct routed module (hooray!)
2. add radix tree routed module (binomial remains default)
3. remove duplicate data storage - orteds were storing nidmap and pidmap data in odls, everyone else in ess
4. add ess APIs to update nidmap, add new pidmap - used only by orteds for MPI-2 support
5. modify code to eliminate multiple calls to orte_routed.update_route that recreated info already in ess pidmap. Add ess API to lookup that info instead. Modify routed modules to utilize that capability
6. setup new ability to shutdown orteds without sending back an "ack" message to mpirun - not utilized yet, will require some changes to plm terminate_orteds functions in managed environments (coming soon)
Initial tests indicating that fully routing comm via defined routing trees may not actually have a significant cost for operations like IB QP setup. More tests required to confirm.
This will require an autogen...
This commit was SVN r19866.
1. completely and cleanly separates responsibilities between the HNP, orted, and tool components.
2. removes all wireup messaging during launch and shutdown.
3. maintains flow control for stdin to avoid large-scale consumption of memory by orteds when large input files are forwarded. This is done using an xon/xoff protocol.
4. enables specification of stdin recipients on the mpirun cmd line. Allowed options include rank, "all", or "none". Default is rank 0.
5. creates a new MPI_Info key "ompi_stdin_target" that supports the above options for child jobs. Default is "none".
6. adds a new tool "orte-iof" that can connect to a running mpirun and display the output. Cmd line options allow selection of any combination of stdout, stderr, and stddiag. Default is stdout.
7. adds a new mpirun and orte-iof cmd line option "tag-output" that will tag each line of output with process name and stream ident. For example, "[1,0]<stdout>this is output"
This is not intended for the 1.3 release as it is a major change requiring considerable soak time.
This commit was SVN r19767.
Cleanup when an orted actually opens the PLM. Unfortunately, some unmentionable people are pushing head node environs out to remote nodes, causing the daemons to think they are the HNP. This helps prevent the confusion.
This commit was SVN r19518.
The problem was that (outside of Odin configure issues) that the IOF is no longer enabled by application processes.
Checkpoint/restart seems to be working once again.
Thanks to Ralph for pointing me here.
This commit was SVN r19508.
* Protect an orte variable used in the orte debugger stuff
* Initialize the datatype code in the Catamount code, as we need it
for intercommunicators (the proc code needs it to pack the remote
name)
* Turn on a bunch of the orte datatype code so that ORTE_NAME is available.
This commit was SVN r19457.
- Move up the __opal_attribute_noreturn__ information
- Actually make it known outside in ess.h
- Additionally allow printf-type checking
This commit was SVN r19210.
This system is "off" by default and only operates upon specific directive for selection of a notifier component. At the moment, the only available component will write an error message to the syslog.
This commit was SVN r19209.