openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	7a21661785	Silence a warning when --without-hwloc is used This commit was SVN r28783.	2013-07-13 17:17:17 +00:00
Dave Goodell	3741d62308	fix --without-hwloc build failure All builds since r28682 configured with '--without-hwloc' fail at "make" time without this fix. Reviewed by rhc@ This commit was SVN r28769. The following SVN revision numbers were found above: r28682 --> open-mpi/ompi@446e33a5d8	2013-07-12 17:21:14 +00:00
Ralph Castain	62378209f0	Even if we don't find the default hostfile, and nothing else was provided, then use all the known nodes. cmr:v1.7.3:#3653:reviewer=jsquyres cmr:v1.6.6:#3654:reviewer=jsquyres This commit was SVN r28718.	2013-07-03 22:31:32 +00:00
Ralph Castain	443a6802b9	If the default hostfile is empty, we need to pickup all the known nodes, not just the head node. cmr:v1.7.3:reviewer=jsquyres cmr:v1.6.6:reviewer=jsquyres This commit was SVN r28717.	2013-07-03 22:25:51 +00:00
Ralph Castain	446e33a5d8	There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes. To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it. This commit was SVN r28682.	2013-06-27 03:04:50 +00:00
Ralph Castain	a51a0a8c48	Fix uninitialized var This commit was SVN r28652.	2013-06-18 22:41:47 +00:00
Jeff Squyres	6d173af329	This commit introduces a new "mindist" ORTE RMAPS mapper, as well as some relevant updates/new functionality in the opal/mca/hwloc and orte/mca/rmaps bases. This work was mainly developed by Mellanox, with a bunch of advice from Ralph Castain, and some minor advice from Brice Goglin and Jeff Squyres. Even though this is mainly Mellanox's work, Jeff is committing only for logistical reasons (he holds the hg+svn combo tree, and can therefore commit it directly back to SVN). ----- Implemented distance-based mapping algorithm as a new "mindist" component in the rmaps framework. It allows mapping processes by NUMA due to PCI locality information as reported by the BIOS - from the closest to device to furthest. To use this algorithm, specify: {{{mpirun --map-by dist:<device_name>}}} where <device_name> can be mlx5_0, ib0, etc. There are two modes provided: 1. bynode: load-balancing across nodes 1. byslot: go through slots sequentially (i.e., the first nodes are more loaded) These options are regulated by the optional ''span'' modifier; the command line parameter looks like: {{{mpirun --map-by dist:<device_name>,span}}} So, for example, if there are 2 nodes, each with 8 cores, and we'd like to run 10 processes, the mindist algorithm will place 8 processes to the first node and 2 to the second by default. But if you want to place 5 processes to each node, you can add a span modifier in your command line to do that. If there are two NUMA nodes on the node, each with 4 cores, and we run 6 processes, the mindist algorithm will try to find the NUMA closest to the specified device, and if successful, it will place 4 processes on that NUMA but leaving the remaining two to the next NUMA node. You can also specify the number of cpus per MPI process. This option is handled so that we map as many processes to the closest NUMA as we can (number of available processors at the NUMA divided by number of cpus per rank) and then go on with the next closest NUMA. The default binding option for this mapping is bind-to-numa. It works if you don't specify any binding policy. But if you specified binding level that was "lower" than NUMA (i.e hwthread, core, socket) it would bind to whatever level you specify. This commit was SVN r28552.	2013-05-22 13:04:40 +00:00
Ralph Castain	5296099ecb	Fix the cpus-per-rank when binding to hwthreads. Add cpus-per-rank to diag printout Thanks to Elena for reporting the problem This commit was SVN r28508.	2013-05-14 20:17:50 +00:00
Ralph Castain	427b6b0b47	Fix the verbosity of yet another framework...sigh. This commit was SVN r28481.	2013-05-13 14:36:32 +00:00
Ralph Castain	5d7a93c032	Add the ability to use an external version of libevent. Clearly not recommended at this time. I've verified that it works in limited scenarios, but more thorough testing and performance impacts need to be assessed. Interesting how many includes had to be fixed here and there to fill in missing dependencies :-) This commit was SVN r28411.	2013-04-29 17:02:37 +00:00
Ralph Castain	252147fba6	Cleanup error message if unknown host is given in -host and -hostfile options This commit was SVN r28262.	2013-03-28 16:52:10 +00:00
Nathan Hjelm	c041156f60	Update ORTE frameworks to use the MCA framework system. This commit was SVN r28240.	2013-03-27 21:14:43 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
Ralph Castain	256414121e	Protect the cpus-per-rank MCA param registration so that --without-hwloc will build This commit was SVN r28232.	2013-03-27 14:53:30 +00:00
Ralph Castain	317915225c	Finish the binding cleanup by removing the no-longer-used binding level scheme. This proved to be fallible as there is no guarantee that the hierarchy it used matched physical reality of the machine (e.g., is L3 "above" the socket or not). Still have to complete the ppr update, but get the rest of it correct. This commit was SVN r28223.	2013-03-26 20:09:49 +00:00
Ralph Castain	6ee32767d4	Restore the cpus-per-proc option for byslot and bynode mapping. Remove the bind_idx (which recorded the index of the hwloc object where the proc was bound) as this would no longer be unique, and just use the bitmap as the standard reference for location. Update the relative locality computation to take bitmaps as its argument. This commit was SVN r28219.	2013-03-26 18:27:50 +00:00
Ralph Castain	2f43989d22	Add debug and handle the use-case where someone (a) uses a hostfile while in a managed allocation to sub-allocate runs, and (b) includes the HNP's node in one of those hostfiles. cmr:v1.7 This commit was SVN r28203.	2013-03-22 00:53:33 +00:00
Ralph Castain	cf9796accd	Remove the old configure option for disabling full rte support - we now use the OMPI rte framework for such purposes This commit was SVN r28134.	2013-02-28 01:35:55 +00:00
Ralph Castain	112f8eedb1	Handle the case where rankfile is providing the allocation This commit was SVN r27971.	2013-01-29 20:37:58 +00:00
Nathan Hjelm	bdedd8b0d3	Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1. Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality. This commit was SVN r27570.	2012-11-06 19:09:26 +00:00
Ralph Castain	285a3b168d	Add an ability to specify the max number of simultaneous procs/node for an application when operating in staged mode. Change some debug statements from OPAL_OUTPUT_VERBOSE to opal_output_verbose so they are available in optimized builds. This commit was SVN r27445.	2012-10-14 03:31:32 +00:00
Ralph Castain	d95025f53a	Ensure we clear the usage numbers when binding on multiple nodes so we don't "carry over" info from one node to the next. Use the same tracking mechanism for binding upwards and in-place to avoid doing a bunch of mallocs. Refs trac:3322 This commit was SVN r27356. The following Trac tickets were found above: Ticket 3322 --> https://svn.open-mpi.org/trac/ompi/ticket/3322	2012-09-20 15:16:06 +00:00
Ralph Castain	a3060cdd15	Fix the bind_downward code - it was incorrectly looking across the entire node instead of only looking below the locale to which the proc had been assigned. In other words, if the proc was mapped to a core, then the only hwthreads that should be considered for binding are those directly below that core. The binding algo was incorrectly looking at ALL hwthreads in that scenario, causing the proc to be bound to an HT outside of the mapped location. This now results in the procs being bound within their assigned location. It also causes us to use only the 0th HT on a core unless --use-hwthread-cpus has been specified (in which case, we use all the HTs in a core). Bind to core binds you to all HTs regardless - the --use-hwthread-cpus only impacts the oversubscribed determination and when binding to HT. cmr:v1.7 This commit was SVN r27342.	2012-09-14 22:01:19 +00:00
Ralph Castain	ca40cb5f1c	Fix comm_spawn by mpirun This commit was SVN r27285.	2012-09-10 17:09:25 +00:00
Ralph Castain	efa50346c8	Error out if we are filtering a hostfile and encounter a node that is not in the resource-managed allocation, giving an error message identifying the file and the node. Don't filter managed allocations thru a default hostfile as this can lead to "hidden" errors. Don't use dash-host info on managed allocations if we using soft locations This commit was SVN r27245.	2012-09-05 19:42:00 +00:00
Ralph Castain	d772e0fc3d	Add an option to treat dash-host specifications as "requested, but not required". So-called "soft" location requests can allow an application to execute even if the ideal allocation isn't available. This commit was SVN r27242.	2012-09-05 18:42:09 +00:00
Ralph Castain	fde83a44ab	This confusion has been around for awhile, caused by a long-ago decision to track slots allocated to a specific job as opposed to allocated to the overall mpirun instance. We eliminated that quite a while ago, but never consolidated the "slots_alloc" and "slots" fields in orte_node_t. As a result, confusion has grown in the code base as to which field to look at and/or update. So (finally) consolidate these two fields into one "slots" field. Add a field in orte_job_t to indicate when all the procs for a job will be launched together, so that staged operations can know when MPI operations are allowed. This commit was SVN r27239.	2012-09-05 01:30:39 +00:00
Ralph Castain	11de735e8a	Complete the revamp of hostfile support in non-managed environments. Working at the app level, ensure that we utilize only those nodes specified for that app, but fall back to the default hostfile (if available) for those with no specification, further falling back to the local host if the default hostfile is not present or is empty. This commit was SVN r27230.	2012-09-04 16:34:05 +00:00
Ralph Castain	66c3f5d18d	When getting target nodes for mapping, there is a difference between not finding any nodes that match the required constraints (either in hostfile or dash-host filtering) and finding at least one such node, but all its slots are busy. Make the return code reflect this difference so the caller can take appropriate action. This commit was SVN r27213.	2012-09-01 10:30:40 +00:00
Ralph Castain	95019cc310	Fix a few places where we weren't completely identifying hostfile-based operations against "localhost" entries. Tell the mapper base to be silent when we don't want errors announced because nodes aren't available for mapping (something it is okay if they are fully used). Fix an infinite loop in the file prepositioning code. This commit was SVN r27210.	2012-08-31 21:28:49 +00:00
Ralph Castain	1b659de132	Get staged execution working on multi-node setups. Improve efficiency by only remapping if all procs not yet mapped in the job. This commit was SVN r27181.	2012-08-29 20:35:52 +00:00
Ralph Castain	98580c117b	Introduce staged execution. If you don't have adequate resources to run everything without oversubscribing, don't want to oversubscribe, and aren't using MPI, then staged execution lets you (a) run as many procs as there are available resources, and (b) start additional procs as others complete and free up resources. Adds a new mapper as well as a new state machine. Remove some stale configure.m4's we no longer need. Optimize the nidmaps a bit by only sending info that has changed each time, instead of sending a complete copy of everything. Makes no difference for the typical MPI job - only impacts things like staged execution where we are sending multiple (possibly many) launch messages. This commit was SVN r27165.	2012-08-28 21:20:17 +00:00
Ralph Castain	aadfe1b61e	Fix a missing test that breaks novm operation. CMR:v1.7 This commit was SVN r27163.	2012-08-28 21:13:57 +00:00
Ralph Castain	cb48fd52d4	Implement the MPI_Info part of MPI-3 Ticket 313. Add an MPI_info object MPI_INFO_GET_ENV that contains a number of run-time related pieces of info. This includes all the required ones in the ticket, plus a few that specifically address recent user questions: "num_app_ctx" - the number of app_contexts in the job "first_rank" - the MPI rank of the first process in each app_context "np" - the number of procs in each app_context Still need clarification on the MPI_Init portion of the ticket. Specifically, does the ticket call for returning an error is someone calls MPI_Init more than once in a program? We set a flag to tell us that we have been initialized, but currently never check it. This commit was SVN r27005.	2012-08-12 01:28:23 +00:00
George Bosilca	ba879c2c51	Remove the unused map. This commit was SVN r26960.	2012-08-07 12:06:13 +00:00
Ralph Castain	53b1a1c976	Cleanly error out when someone asks to map-to <object> if that object doesn't exist on a node. This commit was SVN r26950.	2012-08-04 21:52:36 +00:00
Ralph Castain	96f6f94c24	Ensure we don't get trapped in an infinite loop when ranking bynode if something isn't right This commit was SVN r26948.	2012-08-03 21:45:10 +00:00
Ralph Castain	431d5361ed	For those who really preferred our prior mode of operation that mapped procs and only launched daemons on the nodes that had procs on them, introduce the "novm" state machine component. This recreates the old mode of operation by re-ordering the launch sequence so that we allocate, then map, and then launch daemons only on the reqd nodes (instead of across the entire allocation). This commit was SVN r26946.	2012-08-03 16:30:05 +00:00
Shiqing Fan	12d99a9ebb	Update the hwloc build on Windows and related files. This commit was SVN r26818.	2012-07-20 12:14:28 +00:00
Jeff Squyres	2ba10c37fe	Per RFC, bring in the following changes: * Remove paffinity, maffinity, and carto frameworks -- they've been wholly replaced by hwloc. * Move ompi_mpi_init() affinity-setting/checking code down to ORTE. * Update sm, smcuda, wv, and openib components to no longer use carto. Instead, use hwloc data. There are still optimizations possible in the sm/smcuda BTLs (i.e., making multiple mpools). Also, the old carto-based code found out how many NUMA nodes were ''available'' -- not how many were used ''in this job''. The new hwloc-using code computes the same value -- it was not updated to calculate how many NUMA nodes are used ''by this job.'' * Note that I cannot compile the smcuda and wv BTLs -- I ''think'' they're right, but they need to be verified by their owners. * The openib component now does a bunch of stuff to figure out where "near" OpenFabrics devices are. '''THIS IS A CHANGE IN DEFAULT BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors (I do not have a NUMA machine with an OpenFabrics device that is a non-uniform distance from multiple different NUMA nodes). * Completely rewrite the OMPI_Affinity_str() routine from the "affinity" mpiext extension. This extension now understands hyperthreads; the output format of it has changed a bit to reflect this new information. * Bunches of minor changes around the code base to update names/types from maffinity/paffinity-based names to hwloc-based names. * Add some helper functions into the hwloc base, mainly having to do with the fact that we have the hwloc data reporting ''all'' topology information, but sometimes you really only want the (online \| available) data. This commit was SVN r26391.	2012-05-07 14:52:54 +00:00
Ralph Castain	bd8b4f7f1e	Sorry for mid-day commit, but I had promised on the call to do this upon my return. Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code. Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch. This commit was SVN r26242.	2012-04-06 14:23:13 +00:00
Ralph Castain	811413e9bc	Correctly handle multiple cpu-set ranges. Correctly support optional binding directives combined with cpu-set. This commit was SVN r26187.	2012-03-23 14:50:41 +00:00
Ralph Castain	ce0caf7567	Support -cpu-set by binding to the specified cpus in the absence of any other binding directive. Allows users to subdivide nodes for multiple parallel mpirun invocations. This commit was SVN r26186.	2012-03-23 14:05:52 +00:00
Ralph Castain	bba6508b4b	Handle the default hostfile case a little better... This commit was SVN r25928.	2012-02-15 03:33:49 +00:00
Ralph Castain	f14c4be580	Correct the ordering logic so the list gets correctly built in daemon vpid order This commit was SVN r25818.	2012-01-30 16:25:07 +00:00
Shiqing Fan	bfbd3c67a5	Add a windows file into the tarball. This commit was SVN r25811.	2012-01-29 10:12:02 +00:00
Ralph Castain	3f31feee6f	Handle the case where a user's rankfile specifies only cpus, and not socket:cpu pairs. This commit was SVN r25803.	2012-01-27 12:21:45 +00:00
Ralph Castain	ef94e606c7	Add some debug This commit was SVN r25791.	2012-01-26 19:23:32 +00:00
Shiqing Fan	2c9a4beffd	Add and remove a few components for windows build. This commit was SVN r25775.	2012-01-25 09:01:27 +00:00
Ralph Castain	477582abef	Grrrr....fix ALL the cases where the membind warning occurs. This commit was SVN r25715.	2012-01-11 23:51:18 +00:00

1 2 3 4 5 ...

309 Коммитов