openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	dee2d8646d	Fix plm/rsh runtime check Fix the check for rsh/ssh so we allow the check for SGE and LoadLeveler to occur if user doesn't specify their own launch agent. Fix a Coverity warning Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-13 16:54:03 -08:00
Ralph Castain	230d15f0d9	Cleanup the ras simulator capability, and the relay route thru grpcomm direct. Don't resend wireup info if nothing has changed Fix release of buffer Correct the unpacking order Fix the DVM - now minimized data transfer to it Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-01 15:01:58 -08:00
Ralph Castain	b59ae14a2a	Fix static port and partial allocation operations Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message. Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-28 10:09:44 -08:00
Ralph Castain	c803af5d3d	Minor change to allow qrsh to tree spawn, if supported Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-27 16:34:08 -08:00
Ralph Castain	7c795f4416	If the HNP is going to request topology info, it cannot do so via a routed OOB message as the intervening daemons may not be ready. So disable routing until the VM is ready, and have daemons start routing as they receive the xcast launch msg (which includes the data they need to talk to their peers). Do a little optimization and minimize recomputation of the routing plan. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-27 15:37:16 -08:00
Ralph Castain	d672fad849	Repair rsh/ssh tree spawn Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn. Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-27 11:35:00 -08:00
Josh Hursey	2e64bf42fb	Merge pull request #2810 from jjhursey/fix/ibm/stdiag-to-stdout Extend options for stddiag routing	2017-01-26 14:29:16 -06:00
Ralph Castain	399de0738e	Cleanup launch Given that we only set OOB contact info from inside of events, or before we begin threaded operations (e.g., in the ess), allow set_contact_info to directly update the oob/base framework globals. Correct the nidmap regex decompression routine. Ensure that rank=1 daemon always sends back its topology as this is the most common use-case. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-25 22:06:09 -08:00
Ralph Castain	2f4e87eae9	Have rank=1 daemon always send its topology back as this is the most common use-case Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-25 09:33:11 -08:00
Jeff Squyres	230bbc597d	plm base: make sure to assign "node" early enough Make sure to assign "node" before using it in ORTE_FLAG_SET. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-25 08:02:59 -08:00
Joshua Hursey	dcd9801f7c	orte/iof: Add orte_map_stddiag_to_stdout option * Similar to `orte_map_stddiag_to_stderr` except it redirects `stddiag` to `stdout` instead of `stderr`. * Add protection so that the user canot supply both: - `orte_map_stddiag_to_stderr` - `orte_map_stddiag_to_stdout` Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-24 16:22:59 -06:00
Ralph Castain	86ab751c5e	Next step in reducing launch time: begin reducing the size of the launch message itself. Start by expressing the daemon map as a set of three regular expression strings. On an 8k cluster, this reduces the nidmap contribution from over 200kBytes to 21 bytes in size. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-23 19:54:47 -08:00
Ralph Castain	368684bd63	Revert e9bc293 and try a different approach for scalably dealing with hetero clusters. Have each orted send back its topo "signature". If mpirun detects that this signature has not been seen before, then ask for that daemon to send back its full topology description. This allows the system to only get the topology once for each unique topo in the cluster. Cleanup a typo, and remove no longer needed MCA params for hetero nodes and hetero apps. Hetero nodes will always be automatically detected. We don't support a mix of 32 and 64 bit apps Modify the orte_node_t to use orte_topology_t instead of hwloc_topology_t, updating all the places that use it. Ensure that we properly update topology when we see a different one on a compute node. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-18 10:22:15 -08:00
Ralph Castain	e9bc2934be	Add an MCA param "hnp_on_smgmt_node" that mpirun can use to tell the orteds to ignore its topology signature as mpirun is executing on a system mgmt node, and hence a different topology than the compute nodes Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-16 19:32:01 -08:00
Gilles Gouaillardet	6b9343a966	plm/rsh: plug a memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 15:38:45 +09:00
Gilles Gouaillardet	c0c5dd8ccc	orte: plug a memory leak in orte_rml.recv_cancel do not invoke orte_rml.recv_cancel after the orte progress thread has gone Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 15:38:44 +09:00
Ralph Castain	9eab9a1ed3	Remove stale global variables Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers. Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation). Begin first cut at memory profiler Some minor cleanups of memprobe Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-02 14:04:24 -08:00
Ralph Castain	791f4f1ce3	Adjust debug output for clarity Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-26 14:04:20 -08:00
Ralph Castain	79cde184ad	Allow a PMIx tool to spawn a job Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-03 16:00:47 -08:00
Ralph Castain	d5fd635efe	Bring forward the debugger-related changes Refs https://github.com/open-mpi/ompi/pull/2425 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 13:15:20 -08:00
Howard Pritchard	2cbc0e8472	pmix/cray: fix disable-dlopen problem PR open-mpi/ompi#2432 introduced a regression where configure and build with --disable-dlopn caused build failure owing to unresolved alps lli symbols in the libopal-pal shared library. This commit fixes this problem. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-21 13:45:10 -06:00
Ralph Castain	649301a3a2	Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier). Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.	2016-10-23 21:52:39 -07:00
Gilles Gouaillardet	1846c2d8ad	plm/rsh: use an alternate port if the ORTE_NODE_PORT attribute is set	2016-10-19 16:18:52 +09:00
Ralph Castain	de7b1494d9	Clean out old cruft from the ORCM project	2016-09-21 00:13:30 -07:00
Gilles Gouaillardet	c09899f6af	plm: plus resource leaks as reported by Coverity with CIDs 72274 and 1196733	2016-09-07 10:08:44 +09:00
Ralph Castain	4e0788e9ad	Enable PSM to support dynamic processes Fix comm_spawn to correctly reference the actual parent process that requested the spawn when looking for the parent job object	2016-09-02 10:22:04 -07:00
Joshua Hursey	d26dd2c20e	orte: Expand the application of !orte_keep_fqdn_hostnames * Expand the use of the `orte_keep_fqdn_hostnames` MCA parameter when it is set to false. * If that parameter is set to false (default) then short hostnames (e.g., `node01`) will match with the long hostnames (e.g., `node01.mycluster.org`). This allows a user (or resource manager) to mix the use of short and long hostnames. - Note that this mechanism does _not_ perform a DNS lookup, but instead strips off the FQDN by truncating the hostname string at the first `.` character (when not an IP address). - By default (`false`) the following is true: `node01 == node01.mycluster.org == node01.bogus.com` since we use `node01` as the hostname.	2016-08-26 16:09:04 -05:00
Jeff Squyres	71ec5cfb43	rsh: robustify the check for plm_rsh_agent default value Don't strcmp against the default value -- the default value may change over time. Instead, check to see if the MCA var source is not DEFAULT. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-16 06:58:20 -05:00
Ralph Castain	9f43db7303	Further cleanup getpwuid usage - try it first (unless completely disabled), and then silently failover to try other methods.	2016-08-15 07:51:36 -07:00
Ralph Castain	5717b75b45	Restore the rsh template creation code	2016-08-12 12:43:40 -07:00
Ralph Castain	1c44543854	If the ssh agent hasn't been given, then check for qrsh and friends	2016-08-12 07:46:39 -07:00
Ralph Castain	ddd0d05de3	Fix a bug in the handling of nper<foo> when -host or -hostfile was given. Correctly mark slots as "given" when we auto-assign them. Ensure we don't set the number of procs when using nper<foo> so the PPR mapper can correctly assing them.	2016-07-12 09:27:02 -07:00
Ralph Castain	dd0f843843	Fix rare hangs observed on OS-X by properly thread-shifting upcalls from the PMIx server into ORTE	2016-06-05 21:39:44 -07:00
Ralph Castain	3913595e10	Enable simulation of large-scale clusters by allowing multiple daemons/node. Specifying the ras_base_multiplier parameter to be greater than 1 will cause ORTE to replicate each allocated node by that factor. A daemon will be spawned for each replica, thus letting ORTE function as if it were on a much larger cluster. Note that this cannot be used for MPI performance testing. It is really only useful for ORTE scaling tests. It also only works with the rsh/ssh launcher.	2016-05-29 18:56:18 -07:00
Ralph Castain	6ac7929bd0	Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need. Cleanups per @jjhursey review	2016-05-01 11:30:25 -07:00
Jeff Squyres	6800ef9ec0	m4: rename OMPI_SUMMARY_* macros to OPAL_SUMMARY_* These macros should really be named OPAL_SUMMARY_*; they're used in all projects, and therefore should be in the lowest later project (OPAL). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-04-20 08:40:00 -07:00
Ralph Castain	437f5b4289	Fix map-by node and do-not-launch	2016-04-13 09:21:19 -07:00
Ralph Castain	503e1274a9	Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements Ensure the returned exit status is non-zero if we fail to map If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported. If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules. If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row #7 on Jeff's spreadsheet is incorrect. With that one correction, this now passes all the given use-cases on that spreadsheet. Make things behave under unmanaged allocations more like their managed cousins - if the #slots is given, then no-np shall fill things up. Fixes #1344	2016-03-29 11:21:57 -07:00
Howard Pritchard	69200e6229	plm/alps: fix usage of cray wlm_detect methods Turns out there are some cases where the Cray wlm_detect_get_active may return NULL, in which case fallback to wlm_detect_get_default method is suggested. Make use of the fallback to avoid segfaults under some circumstances in the ALPS plm selection method. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-03-22 11:40:56 -07:00
Ralph Castain	c146c4969b	Revert part of open-mpi/ompi@c1bbbb5e2f to restore the usock component, thus fixing show_help aggregation. Fixes #1467 Restore debugger attach operations Fixes #1225	2016-03-18 21:49:04 -07:00
Jeff Squyres	48c650c47a	configury: minor updates to config summary output	2016-03-10 13:02:52 -08:00
Ralph Castain	4a55fba414	Fix registration of error handlers thru the pmix120 component. A thread-shift operation was hanging on the sync_event_base, which made it dependent on someone calling opal_progress. Unfortunately, a process in "sleep" or spinning outside the MPI library won't do that, and so we never complete errhandler registration.	2016-03-02 15:01:01 -08:00
Ralph Castain	64b7728f33	Fix typo - do not look at daemon job when considering completion of launch	2016-02-21 14:44:51 -08:00
Ralph Castain	d653cf2847	Convert the orte_job_data pointer array to a hash table so it doesn't grow forever as we run lots and lots of jobs in the persistent DVM.	2016-02-21 11:55:49 -08:00
Ralph Castain	6e68d758b9	Cleanup some valgrind complaints about jumps with uninitialized values. Fix a few IOF issues reported by Mark Santcroos when submitting jobs from tools. Add the ability to pass directives to the --output-filename option that tell ORTE to (a) not include the jobid in the path to the output files, and (b) not to copy the output to the tool (i.e., just store it in the files). ck Remove stale debug Fix a segfault if no subscribers are present	2016-02-18 16:30:37 -08:00
Ralph Castain	60a7bc2e50	Enable the PMIx notification callback system. This currently is only supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion. Fixes ##1225	2016-02-18 09:29:12 -08:00
Ralph Castain	e0de4423ba	Remove debug	2016-02-16 20:58:53 -08:00
Mark Santcroos	14f0390b7d	Release child object when we are recording someone's relatives. (Thanks to Mark Santcroos!) Release routing list entries. (Thanks to Mark Santcroos!) Address some Coverity concerns	2016-02-15 20:50:42 -08:00
Ralph Castain	06c3dfc052	Refactor the ORTE DVM code so that external codes can submit multiple jobs using only a single connection to the HNP. * Clean up the DVM so it continues to run even when applications error out and we would ordinarily abort the daemons. * Create a new errmgr component for the DVM to handle the differences. * Cleanup the DVM state component. * Add ORTE bindings directory and brief README * Pass a local tool index around to match jobs. * Pass the jobid on job completion. * Fix initialization logic. * Add framework for python wrapper. * Fix terminate-with-non-zero-exit behavior so it properly terminates only the indicated procs, notifies orte-submit, and orte-dvm continues executing. * Add some missing options to orte-dvm * Fix a bug in -host processing that caused us to ignore the #slots designator. Add a new attribute to indicate "do not expand the DVM" when submitting job spawn requests. * It actually makes no sense that we treat the termination of all children differently than terminating the children of a specific job - it only creates confusion over the difference in behavior. So terminate children the same way regardless. Extend the cmd_line utility to easily allow layering of command line definitions Catch up with ORTE interface change and make build more generic. Disable "fixed dvm" logic for now. Add another cmd_line function to merge a table of cmd line options with another one, reporting as errors any duplicate entries. Use this to allow orterun to reuse the orted_submit code Fix the "fixed_dvm" logic by ensuring we reset num_new_daemons to zero. Also ensure that the nidmap is sent with the first job so the downstream daemons get the node info. Remove a duplicate cmd line entry in orterun. Revise the DVM startup procedure to pass the nidmap only once, at the startup of the DVM. This reduces the overhead on each job launch and ensures that the nidmap doesn't get overwritten. Add new commands to get_orted_comm_cmd_str(). Move ORTE command line options to orte_globals.[ch]. Catch up with extra orte_submit_init parameter. Add example code. Add documentation. Bump version. The nidmap and routing data must be updated prior to propagating the xcast or else the xcast will fail. Fix the return code so it is something more expected when an error occurs. Ensure we get an error returned to us when we fail to launch for some reason. In this case, we will always get a launch_cb as we did indeed attempt to spawn it. The error code will be returned in the complete_cb. Fix the return code from orte_submit_job - it was returning the tracker index instead of "success". Take advantage of ORTE's pretty-print capabilities to provide a nice error output explaining why we failed to launch. Ensure we always get a launch_cb when we fail to launch, but no complete_cb as the job never launched. Extend the error reporting capability to job completion as well. Add index parameter to orte_submit_job(). Add orte_job_cancel and implement ORTE_DAEMON_TERMINATE_JOB_CMD. Factor out dvm termination. Parse the terminate option at tool level. Add error string for ORTE_ERR_JOB_CANCELLED. Add some safeguards. Cleanup and/of comments. Enable the return. Properly ORTE_DECLSPEC orte_submit_halt. Add orte_submit_halt and orte_submit_cancel to interface. Use the plm interface to terminate the job	2016-02-13 08:10:44 -08:00
Howard Pritchard	39367ca0bf	plm/alps: only use srun for Native SLURM Turns out that the way the SLURM plm works is not compatible with the way MPI processes on Cray XC obtain RDMA credentials to use the high speed network. Unlike with ALPS, the mpirun process is on the first compute node in the job. With the current PLM launch system, mpirun (HNP daemon) launches the MPI ranks on that node rather than relying on srun. This will probably require a significant amount of effort to rework to support Native SLURM on Cray XC's. As a short term alternative, have the alps plm (which gets selected by default again on Cray systems regardless of the launch system) check whether or not srun or alps is being used on the system. If alps is not being used, print a helpful message for the user and abort the job launch. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2015-12-22 11:03:42 -08:00

1 2 3 4 5 ...

692 Коммитов