openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	b225366012	Bring the ofi/rml component online by completing the wireup protocol for the daemons. Cleanup the current confusion over how connection info gets created and passed to make it all flow thru the opal/pmix "put/get" operations. Update the PMIx code to latest master to pickup some required behaviors. Remove the no-longer-required get_contact_info and set_contact_info from the RML layer. Add an MCA param to allow the ofi/rml component to route messages if desired. This is mainly for experimentation at this point as we aren't sure if routing wi ll be beneficial at large scales. Leave it "off" by default. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-20 21:01:57 -07:00
Ralph Castain	00ba6a1be6	Protect against NULL topology Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-08 20:56:44 -07:00
Ralph Castain	87201a80ff	Silence coverity warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-27 11:45:53 -07:00
Ralph Castain	657e701c65	Add debug verbosity to the orte data server and pmix pub/lookup functions Start updating the various mappers to the new procedure. Remove the stale lama component as it is now very out-of-date. Bring round_robin and PPR online, and modify the mindist component (but cannot test/debug it). Remove unneeded test Fix memory corruption by re-initializing variable to NULL in loop Resolve the race condition identified by @ggouaillardet by resetting the mapped flag within the same event where it was set. There is no need to retain the flag beyond that point as it isn't used again. Add a new job attribute ORTE_JOB_FULLY_DESCRIBED to indicate that all the job information (including locations and binding) is included in the launch message. Thus, the backend daemons do not need to do any map computation for the job. Use this for the seq, rankfile, and mindist mappers until someone decides to update them. Note that this will maintain functionality, but means that users of those three mappers will see large launch messages and less performant scaling than those using the other mappers. Have the mindist module add procs to the job's proc array as it is a fully described module Protect the hnp-not-in-allocation case Per path suggested by Gilles - protect the HNP node when it gets added in the absence of any other allocation or hostfile Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-25 18:41:27 -07:00
Ralph Castain	f47124e4d3	Finally fix the problem - the key was knowing there were more than 2 topologies involved, and that the HNP is not allocated. Give up on being cute and just search the darned list of topologies - there won't be that many, and if there are (so the scan takes awhile), then too bad. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-10 16:44:19 -07:00
Ralph Castain	55f4b825af	Add verbose output to nidmap code for debugging as this is a new, and sometimes fragile, feature Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-10 12:40:02 -07:00
Ralph Castain	442e307a6e	Fix the nidmap computation to deal with hetero nodes Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-10 08:43:28 -07:00
Ralph Castain	42d31454a5	Merge pull request #3469 from rhc54/topic/nidmap Do not pass topologies during tree spawn of daemons as there is no wa…	2017-05-08 06:22:50 -07:00
Gilles Gouaillardet	e101f2b3f9	orte/util: fix vpids parsing in orte_util_nidmap_parse() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-08 16:46:13 +09:00
Ralph Castain	180809f2ef	Do not pass topologies during tree spawn of daemons as there is no way the HNP can know the backend topologies at that point. Any needed topologies will be sent along with the launch_apps command Do not pass param file MCA params if the user has requested that no param files be read - required when trying to avoid launch time penalties from large numbers of processes reading default param files. The daemon picks them up and passes them along anyway, so it isn't clear what value we gain from having them all read the defaults Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-07 21:14:43 -07:00
Ralph Castain	b526bca56c	Fix a potential segfault by avoiding NULL topologies prior to launching the VM. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-06 20:51:19 -07:00
Ralph Castain	a29ca2bb0d	Enable slurm operations on Cray with constraints Cleanup some errors in the nidmap code that caused us to send unnecessary topologies Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-06 08:58:06 -07:00
Ralph Castain	92c996487c	Update how we pass the node regex so we pass _all_ nodes, even those without daemons. This allows the backend daemons to form a complete picture of the allocation. Include info on which nodes have daemons on them, and populate that info on the backend as well. Set the daemons' state to "running" and mark them as "alive" by default when constructing the nidmap Get the DVM running again Fix direct modex by eliminating race condition caused by releasing data while sending it Up the size limit before compressing Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-03 19:25:15 -07:00
Ralph Castain	d645557fa0	Update to include the PMIx 2.0 APIs for monitoring and job control. Include required integration, but leave the monitors off for now. Move the sensor framework out of ORTE as it is being absorbed into PMIx Fix typo and silence warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-21 17:47:08 -07:00
Ralph Castain	61a71e25ef	Ensure the backend daemons know if we are in a managed allocation and if the HNP was included in the allocation Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-14 10:06:43 -07:00
Ralph Castain	48fc339718	Create an alternative mapping method that pushes responsibility onto the backend daemons. By default, let mpirun only pack the app_context info and send that to the backend daemons where the mapping will be done. This significantly reduces the computational time on mpirun as it isn't running up/down the topology tree computing thousands of binding locations, and it reduces the launch message to a very small number of bytes. When running -novm, fall back to the old way of doing things where mpirun computes the entire map and binding, and then sends the full info to the backend daemon. Add a new cmd line option/mca param --fwd-mpirun-port that allows mpirun to dynamically select a port, but then passes that back to all the other daemons so they will use that port as a static port for their own wireup. In this mode, we no longer "phone home" directly to mpirun, but instead use the static port to wireup at daemon start. We then use the routing tree to rollup the initial launch report, and limit the number of open sockets on mpirun's node. Update ras simulator to track the new nidmap code Cleanup some bugs in the nidmap regex code, and enhance the error message for not enough slots to include the host on which the problem is found. Update gadget platform file Initialize the range count when starting a new range Fix the no-np case in managed allocation Ensure DVM node usage gets cleaned up after each job Update scaling.pl script to use --fwd-mpirun-port. Pre-connect the daemon to its parent during launch while we are otherwise waiting for the daemon's children to send their "phone home" rollup messages Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-07 20:43:12 -08:00
Ralph Castain	b59ae14a2a	Fix static port and partial allocation operations Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message. Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-28 10:09:44 -08:00
Ralph Castain	7c795f4416	If the HNP is going to request topology info, it cannot do so via a routed OOB message as the intervening daemons may not be ready. So disable routing until the VM is ready, and have daemons start routing as they receive the xcast launch msg (which includes the data they need to talk to their peers). Do a little optimization and minimize recomputation of the routing plan. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-27 15:37:16 -08:00
Ralph Castain	399de0738e	Cleanup launch Given that we only set OOB contact info from inside of events, or before we begin threaded operations (e.g., in the ess), allow set_contact_info to directly update the oob/base framework globals. Correct the nidmap regex decompression routine. Ensure that rank=1 daemon always sends back its topology as this is the most common use-case. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-25 22:06:09 -08:00
Ralph Castain	86ab751c5e	Next step in reducing launch time: begin reducing the size of the launch message itself. Start by expressing the daemon map as a set of three regular expression strings. On an 8k cluster, this reduces the nidmap contribution from over 200kBytes to 21 bytes in size. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-23 19:54:47 -08:00
Ralph Castain	188880be3f	Since static ports are only used by ORTE if the runtime option is given, there is no need for a configure option as well - so remove the --enable-orte-static-ports configure option. When decoding the daemon nidmap, mark new daemons as ALIVE by default - we will discover dead ones as we go. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-04 05:01:42 -07:00
Ralph Castain	d4327fd973	The node index isn't normally passed with the packed node object, so we need to set it on the remote end as the orted needs to pass it down to the procs. Refactor the registration code to better package proc-level info - we will separate out the node and app levels in a subsequent change.	2016-08-12 12:06:23 -07:00
Ralph Castain	06c3dfc052	Refactor the ORTE DVM code so that external codes can submit multiple jobs using only a single connection to the HNP. * Clean up the DVM so it continues to run even when applications error out and we would ordinarily abort the daemons. * Create a new errmgr component for the DVM to handle the differences. * Cleanup the DVM state component. * Add ORTE bindings directory and brief README * Pass a local tool index around to match jobs. * Pass the jobid on job completion. * Fix initialization logic. * Add framework for python wrapper. * Fix terminate-with-non-zero-exit behavior so it properly terminates only the indicated procs, notifies orte-submit, and orte-dvm continues executing. * Add some missing options to orte-dvm * Fix a bug in -host processing that caused us to ignore the #slots designator. Add a new attribute to indicate "do not expand the DVM" when submitting job spawn requests. * It actually makes no sense that we treat the termination of all children differently than terminating the children of a specific job - it only creates confusion over the difference in behavior. So terminate children the same way regardless. Extend the cmd_line utility to easily allow layering of command line definitions Catch up with ORTE interface change and make build more generic. Disable "fixed dvm" logic for now. Add another cmd_line function to merge a table of cmd line options with another one, reporting as errors any duplicate entries. Use this to allow orterun to reuse the orted_submit code Fix the "fixed_dvm" logic by ensuring we reset num_new_daemons to zero. Also ensure that the nidmap is sent with the first job so the downstream daemons get the node info. Remove a duplicate cmd line entry in orterun. Revise the DVM startup procedure to pass the nidmap only once, at the startup of the DVM. This reduces the overhead on each job launch and ensures that the nidmap doesn't get overwritten. Add new commands to get_orted_comm_cmd_str(). Move ORTE command line options to orte_globals.[ch]. Catch up with extra orte_submit_init parameter. Add example code. Add documentation. Bump version. The nidmap and routing data must be updated prior to propagating the xcast or else the xcast will fail. Fix the return code so it is something more expected when an error occurs. Ensure we get an error returned to us when we fail to launch for some reason. In this case, we will always get a launch_cb as we did indeed attempt to spawn it. The error code will be returned in the complete_cb. Fix the return code from orte_submit_job - it was returning the tracker index instead of "success". Take advantage of ORTE's pretty-print capabilities to provide a nice error output explaining why we failed to launch. Ensure we always get a launch_cb when we fail to launch, but no complete_cb as the job never launched. Extend the error reporting capability to job completion as well. Add index parameter to orte_submit_job(). Add orte_job_cancel and implement ORTE_DAEMON_TERMINATE_JOB_CMD. Factor out dvm termination. Parse the terminate option at tool level. Add error string for ORTE_ERR_JOB_CANCELLED. Add some safeguards. Cleanup and/of comments. Enable the return. Properly ORTE_DECLSPEC orte_submit_halt. Add orte_submit_halt and orte_submit_cancel to interface. Use the plm interface to terminate the job	2016-02-13 08:10:44 -08:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Ralph Castain	1feaffbb15	Get the blasted singleton comm_spawn working again. There remain problems with the Slurm interaction in this use-case as the PMI components (if configured to build) try to run even when a Slurm allocation hasn't been made, but I leave that to someone else to resolve. I did, however, tell the Slurm ess to quit interfering with applications launched in this use-case by ORTE daemons, so things do work when inside a Slurm allocation. Also discovered that the rsh launcher is not picking up --enable-orterun-prefix-by-default when invoked during singleton comm_spawn, but I was unable to see why that was happening and ran out of time. cmr=v1.8.2:reviewer=rhc This commit was SVN r32229.	2014-07-13 14:47:22 +00:00
Gilles Gouaillardet	d26ac02b4a	#if OPAL_HAVE_HWLOC protect access to orte_proc_info_t.cpuset Fix a bug when trunk is configured with --without-hwloc v1.8 is safe so no cmr This commit was SVN r31957.	2014-06-06 07:25:39 +00:00
Ralph Castain	b771388fa7	We really need to send all the daemon info whenever the daemon job has changed as new daemons need a full nidmap cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31948.	2014-06-04 03:38:54 +00:00
Ralph Castain	f1978fba7c	Cleanup a set of typos on the orte_get_attribute call This commit was SVN r31942.	2014-06-03 20:36:38 +00:00
Ralph Castain	8736a1c138	Per RFC: http://www.open-mpi.org/community/lists/devel/2014/05/14822.php Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root). This commit was SVN r31916.	2014-06-01 16:14:10 +00:00
Nathan Hjelm	73bfecd650	More leak fixes. Two leaks are fixed in this commit: - Do not leak btl component list items. - Do not leak the nodename when decoding the pidmap. cmr=v1.8.2:reviewer=rhc This commit was SVN r31779.	2014-05-15 16:38:13 +00:00
Gilles Gouaillardet	5f82c391a6	Fix memory leaks in orte/util/nidmap.c This patch fixes four memory leaks in orte/util/nidmap.c : - hwloc_get_root_obj(opal_hwloc_topology)->userdata was never freed - even if bo->bytes is freed in the decode, bo was not freed - a job list is populated but never used nor freed cmr=v1.8.2:reviewer=rhc This commit was SVN r31770.	2014-05-15 08:28:53 +00:00
Ralph Castain	05590b6a8c	Correct the datastore containing the coprocessor info This commit was SVN r31677.	2014-05-07 19:29:12 +00:00
Ralph Castain	087b84b0ef	Add some further debug to the dstore framework. When doing comm_spawn, we have to exchange any provided cpu bitmaps to ensure both sides compute the same locality, else various mpi frameworks can go bonkers. This commit was SVN r31572.	2014-04-30 19:29:00 +00:00
Ralph Castain	8cda1b3dc6	Don't store cpu_bitmap unless it is non-NULL This commit was SVN r31570.	2014-04-30 18:12:48 +00:00
Ralph Castain	c4c9bc1573	As per the RFC: http://www.open-mpi.org/community/lists/devel/2014/04/14496.php Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM This commit was SVN r31557.	2014-04-29 21:49:23 +00:00
Ralph Castain	5a868028a8	Revert r31091 - the functionality didn't disappear, but moved into the MPI layer :-( This commit was SVN r31093. The following SVN revision numbers were found above: r31091 --> open-mpi/ompi@edf680855e	2014-03-17 22:30:03 +00:00
Ralph Castain	edf680855e	Restore locality computation to the nidmap code - don't know how/when it was removed, but that was not good cmr=v1.7.5:reviewer=hjelmn This commit was SVN r31091.	2014-03-17 21:59:25 +00:00
Ralph Castain	418ca60776	Since we don't know the name of the local leader, store that info under our own name :-) This commit was SVN r30777.	2014-02-20 01:39:52 +00:00
Ralph Castain	262c927778	Define a new key and store the process name of the local_rank=0 process on each node so that the MPI layer can retrieve it as desired. This commit was SVN r30759.	2014-02-18 00:32:58 +00:00
Ralph Castain	c3df744a3b	Shift the orte_db_localrank key to the opal level. Add the job and proc-level session directory names to the database using opal_db keys. This commit was SVN r30746.	2014-02-17 01:40:56 +00:00
Ralph Castain	7480beb7f0	Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it. This isn't being used yet - just enabling Nathan to do what he needs. *** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *** This commit was SVN r29708.	2013-11-14 17:01:43 +00:00
Ralph Castain	24c811805f	************************************************************** This change contains a non-mandatory modification of the MPI-RTE interface. Anyone wishing to support coprocessors such as the Xeon Phi may wish to add the required definition and underlying support ************************************************************** Add locality support for coprocessors such as the Intel Xeon Phi. Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host. So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following: 1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board 2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions 3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future. 4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time. 5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored. 6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set. cmr:v1.7.4:reviewer=hjelmn This commit was SVN r29435.	2013-10-14 16:52:58 +00:00
Ralph Castain	5ec422dbc1	Correctly compute num local peers when launched via mpirun This commit was SVN r29327.	2013-10-02 01:46:09 +00:00
Ralph Castain	d565a76814	Do some cleanup of the way we handle modex data. Identify data that needs to be shared with peers in my job vs data that needs to be shared with non-peers - no point in sharing extra data. When we share data with some process(es) from another job, we cannot know in advance what info they have or lack, so we have to share everything just in case. This limits the optimization we can do for things like comm_spawn. Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create. cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data This commit was SVN r29274.	2013-09-27 00:37:49 +00:00
Ralph Castain	63d10d2d0d	Fix typo Refs trac:3729 This commit was SVN r29057. The following Trac tickets were found above: Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729	2013-08-22 16:05:58 +00:00
Ralph Castain	16c5b30a1f	Since the calls to "PMI get" scale by number of procs (not nodes), it makes more sense to have the MCA param be the cutoff based on number of procs. Also, it occurred to me that this shouldn't impact the nidmap process as that is built and circulated when we launch via mpirun, not during direct launch. So shift the cutoff param to the MPI layer, and have it solely determine whether or not we call modex_recv on the hostname. If comm_world is of size greater than the cutoff, then we don't automatically retrieve the hostname when we build the ompi_proc_t for a process - instead, we fill the hostname entry on first call to modex_recv for that process. The param is now "ompi_hostname_cutoff=N", where N=number of procs for cutoff. Refs trac:3729 This commit was SVN r29056. The following Trac tickets were found above: Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729	2013-08-22 03:40:26 +00:00
Ralph Castain	45e695928f	As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time: * add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit. * remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL" * modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded * removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base * added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames This commit was SVN r29052.	2013-08-20 18:59:36 +00:00

1 2 3 4

162 Коммитов