Aleksej Saushev.
Dont use bash or bashism in shell scripts
We should use Posix' setpgid(0,0), which is equivalent to setpgrp().
This commit was SVN r22829.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.
The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.
This commit was SVN r22824.
Cleanup the kill_procs command by removing a no-longer-used param. We update the process state when the proc actually exits.
This commit was SVN r22783.
Modify the orte configure options to specify --enable-multicast such that it directs components to build or not instead of littering the code base with #if's. Remove those #if's where they used to occur.
Add a new grpcomm "mcast" module to support multicast operations. Still some work required to properly perform daemon collectives for comm_spawn operations. New module only builds when --enable-multicast is provided, and when specifically selected.
This commit was SVN r22709.
The change to just using SIGKILL was originally done due to problems whereby waitpid thought a proc had died, but it hadn't. We'll continue debugging that problem separately, but SIGTERM is required for C/R to work properly.
This commit was SVN r22674.
Have orte call setpgrp after forking (but before exec) when
orte_forward_job_control is set. Then have it send signals to the
child's process group. This allows suspending jobs that fork.
If a SIGTSTP arrives before the processes have been launched, then
record it and suspend them right after launching.
This commit was SVN r22557.
In CMake 2.6 and earlier, this function add dependencies for targets and also link the target libraries automatically, but in CMake 2.8,this behavior has been changed, i.e. it will only add the dependencies but no link, which will cause linking errors at compilation time.
This commit was SVN r22405.
Add orte configuration option to control the use of the framework in the system. Although the code will build, it will not be active unless configured with --enable-bootstrap.
If bootstrap is enabled and the new opal_sysinfo framework can successfully determine the cpu model, pass that info to the application as an MCA param to support some work at Sun.
Also, have daemons report back the resources they find to guide process mapping in bootstrap operations (i.e., where the daemon starts at node boot as opposed to being launched at application start).
Adjust some platform files to enable these capabilities.
This commit was SVN r22244.
The only sure way to kill the thing is with SIGKILL. After hours spent trying to debug this bizarre situation with a reliable reproducer, I finally tracked it down and fixed it.
Go figure...I sure can't.
This commit was SVN r22220.
Continue the reorganization of the configure system. Move files from the main config directory to their appropriate level-specific config directories. Modify the configure system to correctly handle compiler detection, test, and setup so that all things pertaining to opal and orte are done at the lower level, with the ompi configure system only looking at mpi-specific options.
Ensure the wrapper compilers for orte and ompi only get built when appropriate. Add support for c++ to the orte wrapper compilers, both script and non-script versions.
This commit was SVN r22138.
Wouldn't be so bad, except...the errmgr orders the termination of ALL procs, which kills any other job that should have been left alone.
Add a new proc and job state indicating "killed_by_cmd" so we can tell the difference between a proc/job that was deliberately terminated by us vs one that is killed by external signal.
This change was tested to ensure it didn't interfere with ctrl-c operation (it doesn't - we order termination of all jobs when we get a ctrl-c).
This commit was SVN r22100.
1. ensure that orte_rmaps_base_schedule_policy does not override cmd line settings
2. when you try to bind to more cores than we have, generate a not-enough-processors error message
3. allow npersocket -bind-to-core combination - because, yes, somebody actually wants to do it.
This commit was SVN r21996.
1. default -npersocket to force -bind-to-socket
2. if we cannot get a value for cores/socket, try using #logical cpus. otherwise, default to 1 core
3. add missing error message for not-enough-processors
4. since we no longer loop through orte_register_params twice, put the auto-detect of
topology info in the rte_init for hnp and std_orted
5. fix bind-to-core, bysocket combination
This commit was SVN r21992.
The new options work by adding an ":if-avail" qualifier to the "bind-to-socket" and "bind-to-core" MCA params. If the system does not support this capability, the job will launch anyway. Without the qualifier, the job will abort with an error message indicating that the required functionality is not supported on this system.
This commit was SVN r21975.
1. finalize the logic for properly respecting externally assigned bindings. Thanks to Chris Samuel for his help with this. Still needs some acid testing, but appears to now work.
2. remove the double-logic of requiring opal_paffinity_alone AND bind-to-foo. If the user specifies bind-to-foo, trust her and just do it.
This commit was SVN r21885.
I have no machine which allows me to do external binding, so I will have to ask others to test the new logic. However, I did verify that these changes don't break the existing logic when no external bindings were present.
This commit was SVN r21842.
Adds several new mpirun options:
* -bysocket - assign ranks on a node by socket. Effectively load balances the procs assigned to a node across the available sockets. Note that ranks can still be bound to a specific core within the socket, or to the entire socket - the mapping is independent of the binding.
* -bind-to-socket - bind each rank to all the cores on the socket to which they are assigned.
* -bind-to-core - currently the default behavior (maintained from prior default)
* -npersocket N - launch N procs for every socket on a node. Note that this implies we know how many sockets are on a node. Mpirun will determine its local values. These can be overridden by provided values, either via MCA param or in a hostfile
Similar features/options are provided at the board level for multi-board nodes.
Documentation to follow...
This commit was SVN r21791.
Update the loop_spawn test to remove a sleep so that it runs at max speed, letting the new code catch when we overrun ourselves and wait for room to be cleared for the next comm_spawn.
This commit was SVN r21390.
messages when delivering a signal (like STOP or CONT)
to a non-existant process. This fixes trac:1929.
Also, only print one error message in the other cases.
This commit was SVN r21263.
The following Trac tickets were found above:
Ticket 1929 --> https://svn.open-mpi.org/trac/ompi/ticket/1929
Add a new tm ess module that exploits this capability.
Update the various plm modules to enable it - just a minor change reflecting an added param to a plm base function.
Additional fixes included:
1. remove an erroneous cleanup of session directories in the tool finalize procedure - tools don't create session directories to begin with!
2. fix a duplicate free when attempting to execute a non-existent app
3. cleanup an typo in the comm utilities
4. fix comm_spawn - was perturbed by the changes in pack/unpack of orte_job_t to properly support orte-ps
Been tested on slurm and tm machines, using all tests in orte/test/mpi. May run into issue with command line length on large jobs due to inclusion of node info to support static ports - will fix this next with addition of regexp generator to compress that info.
This commit was SVN r21248.
1. replacing mpi_paffinity_alone with opal_paffinity_alone - for back-compatibility, I have aliased mpi_paffinity_alone to the new param name. This caus
es a mild abstraction break in the opal/mca/paffinity framework - per the devel discussion...live with it. :-) I also moved the ompi_xxx global variable
that tracked maffinity setup so it could be properly closed in MPI_Finalize to the opal/mca/maffinity framework to avoid an abstraction break.
2. Added code to the odls/default module to perform paffinity binding and maffinity init between process fork and exec. This has been tested on IU's odi
n cluster and works for both MPI and non-MPI apps.
3. Revise MPI_Init to detect if affinity has already been set, and to attempt to set it if not already done. I have *not* tested this as I haven't yet f
igured out a way to do so - I couldn't get slurm to perform cpu bindings, even though it supposedly does do so.
This has only been lightly tested and would definitely benefit from a wider range of evaluation...
This commit was SVN r21209.
This causes the orteds in the routing tree to remain alive until all termination "acks" from orteds below them have passed through. Thus, if we use static ports, we no longer require a direct orted-to-mpirun connection.
Also modify the binomial routed module so it conforms to what all the other routed modules do and have all messages pass along the routing tree instead of short-circuiting between orteds. This further reduces the number of ports being opened on backend nodes.
This commit was SVN r21203.
OMPI_* to OPAL_*. This allows opal layer to be used more independent
from the whole of ompi.
NOTE: 9 "svn mv" operations immediately follow this commit.
This commit was SVN r21180.
- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
several header files (previously included by header-files)
now have to be moved "upward".
This is mainly system headers such as string.h, stdio.h and for
networking, but also some orte headers.
This commit was SVN r21095.
In case we use memcmp, strlen, strup and friends include <string.h>
Also several constants.h are not included directly
- Let's have mca_topo_base_cart_create return ompi-errors in
ompi/mca/topo/base/topo_base_cart_create.c
This commit was SVN r20773.
Adapt orte_process_info to orte_proc_info, and
change orte_proc_info() to orte_proc_info_init().
- Compiled on linux-x86-64
- Discussed with Ralph
This commit was SVN r20739.
Only proc_info.h-internal include file is opal/dss/dss_types.h
- In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
works fine, no errors.
Again, let's have MTT the last word.
This commit was SVN r20631.
Often, orte/util/show_help.h is included, although no functionality
is required -- instead, most often opal_output.h, or
orte/mca/rml/rml_types.h
Please see orte_show_help_replacement.sh commited next.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
actually showed two *missing* #include "orte/util/show_help.h"
in orte/mca/odls/base/odls_base_default_fns.c and
in orte/tools/orte-top/orte-top.c
Manually added these.
Let's have MTT the last word.
This commit was SVN r20557.
Correct the orte-show-help file when a rank is out of bounds, and do that test where a wildcard doesn't get incorrectly flagged as out-of-bounds.
This commit was SVN r20398.
1. fix a race condition whereby a proc's output could trigger an event prior to the other outputs being setup, thus c ausing the IOF to declare the proc "terminated" too early. This was really rare, but could happen.
2. add a new "timestamp-output" option that timestamp's each line of output
3. add a new "output-filename" option that redirects each proc's output to a separate rank-named file.
4. add a new "xterm" option that redirects the output of the specified ranks to a separate xterm window.
This commit was SVN r20392.
SIGCONT to the a.outs. By default, they are not forwarded and
the behavior remains as it has always been. However, if one
runs with --mca orte_forward_job_control 1, then mpirun will
catch those two signals and forward them to the orteds which
will deliver them to the a.outs. We have had requests for
this feature.
This commit was SVN r20391.
If the --wdir option is given, check to see if the user provided a relative path. If so, convert it to an absolute path. This is needed to maintain consistent behavior across environements. Some environments automatically chdir to your current working directory when launching the remote orted, while others (e.g., ssh) don't. This levels the playing field and reduces user surprise.
This commit was SVN r20342.
Reverted r20306 since the fix caused 100% failues on our !BigRed system.
See the comments on ticket #1763 for the details.
This commit was SVN r20339.
The following SVN revision numbers were found above:
r20306 --> open-mpi/ompi@8c87e48721
The following Trac tickets were found above:
Ticket 1763 --> https://svn.open-mpi.org/trac/ompi/ticket/1763
* Improved the error propagation from a backend orted
* Fixed a hang in orterun due to failed files transferred
* Fix the movement of files with relative path names
* Improved error messages when a file cannot be moved
* Move file checks to FileM instead of embedding then in the ODLS
This commit Refs trac:1770
This commit was SVN r20331.
The following Trac tickets were found above:
Ticket 1770 --> https://svn.open-mpi.org/trac/ompi/ticket/1770
Also, per chat with Jeff, modified the Makefile.am's of a few orte tools so that they were consistent in the way we generate the ompi-equivalent cmds.
This commit was SVN r20165.
Basically, the remaining problem turned out to be:
1. closing stdout/stderr during orte_finalize of mpirun
2. inadvertently setting up a write event on fd = -1
3. devising a scheme to more accurately track when the stdin write event was active vs closed so it only got released once
This passed prelim MTT testing by Jeff and Tim, but should soak for awhile before migrating to 1.3.
This commit was SVN r20106.
The following SVN revision numbers were found above:
r20064 --> open-mpi/ompi@a07660aea8
r20068 --> open-mpi/ompi@ec930d14a9
r20074 --> open-mpi/ompi@2940309613
It is a small launch performance improvement as now we relay the launch cmd across to the next daemon before taking the time to launch our own local procs. Still, it does allow more parallel operations during the launch procedure.
This commit was SVN r20104.
1. a direct callback from waitpid - this set the waitpid_fired flag
2. a notify event callback from the IOF - this set the iof complete flag
3. a message via the daemon cmd processor from the proc "de-registering" the sync, thus indicating it was going through MPI_Finalize.
The problem is that these could overlap, with the first two allowing the orted to declare the proc complete before the daemon had responded to #3.
This change forces all three events to flow through the daemon cmd processor, thus ensuring an ordered handling. I'm not certain this will solve the problem, but will await further MTT reports to see. Unfortunately, the problem doesn't show up on any manual or script-based tests I have been able to run, even when I duplicate the exact cmd that fails under MTT.
This commit was SVN r20074.
1. coordination of job completion notification to include a requirement for both waitpid detection AND notification that all iof pipes have been closed by the app
2. change of all IOF read and write events to be non-persistent so they can properly be shutdown and restarted only when required
3. addition of a delay (currently set to 10ms) before restarting the stdin read event. This was required to ensure that the stdout, stderr, and stddiag read events had an opportunity to be serviced in scenarios where large files are attached to stdin.
This commit was SVN r20064.
This commit does that by ensuring that daemons retain knowledge of proc location for all procs in their job family. It required a minor change to the ESS API to allow the daemons to update their pidmaps as data was received. In addition, the routed modules have been updated to take advantage of the newly available info, and the encode/decode pidmap utilities have been updated to communicate the required info in the launch message.
This commit was SVN r20022.
1. modify the iof to track when a proc actually closes all of its open iof output pipes. When this occurs, notify the odls that the proc's iof is complete. This is done via a zero-time event so that we can step out of the read event before processing the notification.
2. in the odls, modify the waitpid callback so it only flags that it was called. Add a function to receive the iof-complete notification, and a function that checks for both iof complete and waitpid callback before declaring a proc fully terminated. This ensures that we read and deliver -all- of the IO prior to declaring the job complete.
Also modified the odls call to orte_iof.close (and the component's implementation) so it only closes stdin, leaving the other io channels alone. This fixes the other half of the known problem.
This should fix the ticket on this subject, but I'll wait to close it pending further testing in the trunk.
This commit was SVN r19991.
The problem was that we close the same fd twice, and that meantime the fd could have been reassigned
to some other file or socket.
This commit was SVN r19869.
1. remove direct routed module (hooray!)
2. add radix tree routed module (binomial remains default)
3. remove duplicate data storage - orteds were storing nidmap and pidmap data in odls, everyone else in ess
4. add ess APIs to update nidmap, add new pidmap - used only by orteds for MPI-2 support
5. modify code to eliminate multiple calls to orte_routed.update_route that recreated info already in ess pidmap. Add ess API to lookup that info instead. Modify routed modules to utilize that capability
6. setup new ability to shutdown orteds without sending back an "ack" message to mpirun - not utilized yet, will require some changes to plm terminate_orteds functions in managed environments (coming soon)
Initial tests indicating that fully routing comm via defined routing trees may not actually have a significant cost for operations like IB QP setup. More tests required to confirm.
This will require an autogen...
This commit was SVN r19866.
Without this patch, doing spawn in a loop ended up by exhausting all
available file descriptors pretty quickly. There were about 5 file
descriptors opened per spawned process. Now the number of file
descriptors managed by the process (orted or HNP)
is a lot smaller.
This commit was SVN r19864.
1. completely and cleanly separates responsibilities between the HNP, orted, and tool components.
2. removes all wireup messaging during launch and shutdown.
3. maintains flow control for stdin to avoid large-scale consumption of memory by orteds when large input files are forwarded. This is done using an xon/xoff protocol.
4. enables specification of stdin recipients on the mpirun cmd line. Allowed options include rank, "all", or "none". Default is rank 0.
5. creates a new MPI_Info key "ompi_stdin_target" that supports the above options for child jobs. Default is "none".
6. adds a new tool "orte-iof" that can connect to a running mpirun and display the output. Cmd line options allow selection of any combination of stdout, stderr, and stddiag. Default is stdout.
7. adds a new mpirun and orte-iof cmd line option "tag-output" that will tag each line of output with process name and stream ident. For example, "[1,0]<stdout>this is output"
This is not intended for the 1.3 release as it is a major change requiring considerable soak time.
This commit was SVN r19767.
Refers to ticket #1548. Although this appears to fix the problem, the ticket will be held open pending further test prior to transition to the 1.3 branch.
This commit was SVN r19674.
See opal/mca/paffinity/paffinity.h for explanation as to the physical vs logical nature of the params used in the API.
Fixes trac:1435
This commit was SVN r19391.
The following Trac tickets were found above:
Ticket 1435 --> https://svn.open-mpi.org/trac/ompi/ticket/1435
Provide support for four MPIR extensions that allow specification of debugger daemon executable, argv for the debugger daemon, whether or not to forward debugger daemon IO, and whether or not debugger daemon will piggy-back on ORTE OOB network. Last is not yet implemented.
No change in behavior or operation occurs unless (a) the debugger specifically utilizes the extensions and, for co-locate while running, the user specifically enables the capability via an MCA param. Two of the MPIR extensions supported here are used in a widely-used debugger for a large-scale installation. The other two extensions are new and being utilized in prototype work by several debuggers for possible future release.
This commit was SVN r19275.
* add "register" function to mca_base_component_t
* converted coll:basic and paffinity:linux and paffinity:solaris to
use this function
* we'll convert the rest over time (I'll file a ticket once all
this is committed)
* add 32 bytes of "reserved" space to the end of mca_base_component_t
and mca_base_component_data_2_0_0_t to make future upgrades
[slightly] easier
* new mca_base_component_t size: 196 bytes
* new mca_base_component_data_2_0_0_t size: 36 bytes
* MCA base version bumped to v2.0
* '''We now refuse to load components that are not MCA v2.0.x'''
* all MCA frameworks versions bumped to v2.0
* be a little more explicit about version numbers in the MCA base
* add big comment in mca.h about versioning philosophy
This commit was SVN r19073.
The following Trac tickets were found above:
Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
Cleanup the logic in the odls for when processes terminate. It turns out that we were only going through the kill_proc logic once instead of looping over all local children when we ordered a daemon to kill its local procs. This went unnoticed for some time as for most systems the local procs were terminated anyway when the daemon terminated due to the parent/child relationship.
Solaris is apparently different - the children are not automatically terminated when the parent dies. As a result, it acts as a detector for this bug.
Mucho thanks to Rolf V. for his help in debugging - and to IM for letting me follow his gdb progress in quasi real-time!
This commit was SVN r19044.
Modify the odls to remove a (size_t) typecast in front of the num_processors variable just in case it is returned negative. This usually is accompanied by an opal_error, so this shouldn't make any difference - but it is more technically correct.
This commit was SVN r19008.
Fixed allocation of all ranks when using RANKFILE, but not all ranks assigned
Aborting if using RANKFILE, but np wasn't specified a little earlier
Clean mca_rmaps_rank_file_component.debug
This commit was SVN r19004.
Short version: remove opal_paffinity_alone and restore
mpi_paffinity_alone. ORTE makes various information available for the
MPI layer to decide what it wants to do in terms of processor
affinity.
Details:
* remove opal_paffinity_alone MCA param; restore mpi_paffinity_alone
MCA param
* move opal_paffinity_slot_list param registration to paffinity base
* ompi_mpi_init() calls opal_paffinity_base_slot_list_set(); if that
succeeds use that. If no slot list was set, see if
mpi_paffinity_alone was set. If so, bind this process to its Node
Local Rank (NLR). The NLR is the ORTE-maintained slot ID; if you
COMM_SPAWN to a host in this ORTE universe that already has procs
on it, the NLR for the new job will start at N (not 0). So this is
slightly better than mpi_paffinity_alone in the v1.2 series.
* If a slot list is specified *and* mpi_paffinity_alone is set, we
display an error and abort.
* Remove calls from rmaps/rank_file component to register and lookup
opal_paffinity mca params.
* Remove code in orte/odls that set affinities - instead, have them
just pass a slot_list if it exists.
* Cleanup the orte/odls code that determined
oversubscribed/want_processor as these were just opposites of each
other.
This commit was SVN r18874.
The following Trac tickets were found above:
Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383
1. repair of the linear and direct routed modules
2. repair of the ompi/pubsub/orte module to correctly init routes to the ompi-server, and correctly handle failure to correctly parse the provided ompi-server URI
3. modification of orterun to accept both "file" and "FILE" for designating where the ompi-server URI is to be found - purely a convenience feature
4. resolution of a message ordering problem during the connect/accept handshake that allowed the "send-first" proc to attempt to send to the "recv-first" proc before the HNP had actually updated its routes.
Let this be a further reminder to all - message ordering is NOT guaranteed in the OOB
5. Repair the ompi/dpm/orte module to correctly init routes during connect/accept.
Reminder to all: messages sent to procs in another job family (i.e., started by a different mpirun) are ALWAYS routed through the respective HNPs. As per the comments in orte/routed, this is REQUIRED to maintain connect/accept (where only the root proc on each side is capable of init'ing the routes), allow communication between mpirun's using different routing modules, and to minimize connections on tools such as ompi-server. It is all taken care of "under the covers" by the OOB to ensure that a route back to the sender is maintained, even when the different mpirun's are using different routed modules.
6. corrections in the orte/odls to ensure proper identification of daemons participating in a dynamic launch
7. corrections in build/nidmap to support update of an existing nidmap during dynamic launch
8. corrected implementation of the update_arch function in the ESS, along with consolidation of a number of ESS operations into base functions for easier maintenance. The ability to support info from multiple jobs was added, although we don't currently do so - this will come later to support further fault recovery strategies
9. minor updates to several functions to remove unnecessary and/or no longer used variables and envar's, add some debugging output, etc.
10. addition of a new macro ORTE_PROC_IS_DAEMON that resolves to true if the provided proc is a daemon
There is still more cleanup to be done for efficiency, but this at least works.
Tested on single-node Mac, multi-node SLURM via odin. Tests included connect/accept, publish/lookup/unpublish, comm_spawn, comm_spawn_multiple, and singleton comm_spawn.
Fixes ticket #1256
This commit was SVN r18804.