openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	4320457394	Fix the debug output - you can't print the cpuset pointer using the %p format without generating warnings This commit was SVN r32811.	2014-09-29 17:10:38 +00:00
Howard Pritchard	f8ac8bb6b0	remove improper use of hwloc_bitmap_free When using the native aprun launcher, it was observed that there were frequent memory corruption errors occuring either during a PMI kvs-fence operation, or at mpi termation during opal cleanup of allocated objects. This was especially bad when using aprun --c none In some cases, the application would even just hang in finalize if using ptmalloc, owing to some kind of infinite loop in cleanup of small blocks, etc. It turns out that the proble was in orte_ess_base_proc_binding's improper use of opal_hwloc_base_get_available_cpus. The cpuset (bitmap) returned from that function is not meant to be freed by the caller. This problem is likely never observed when using the mpirun launcher as there's an early exit if the OMPI_MCA_orte_bound_at_launch environment variable is set. This commit was SVN r32809.	2014-09-29 16:10:37 +00:00
Ralph Castain	17846411c3	Now that we have an ORTE thread running in apps, we can't just call "exit" during RTE abort as that is happening in a thread, and (at least in some environments) doesn't result in the main thread being immediately terminated. Instead, we wind up going thru orte_finalize in the main thread, which isn't what we want. So replace the call to "exit" with the "quick exit" variant "_exit", which causes the entire process to exit immediately. (custom patch has been posted for 1.8.3) This commit was SVN r32780.	2014-09-23 22:51:10 +00:00
Howard Pritchard	1508a01325	Fixes to enable mpirun to work again on Cray The ess pmi module was not handling aprun launched daemons. All daemons were thinking they were vpid 1. Also, turns out that on cray systems using MOM nodes for launched jobs, just detecting whether or not a process is in a PAGG container is not sufficient. Crank up the priority of the alps PLM component in the event that the configure detected the presence of both slurm and alps. Have the ESS pmi component open the pmix framework and select a pmix component. This commit was SVN r32773.	2014-09-23 15:37:26 +00:00
Ralph Castain	5cdbc00136	Re-enable the usock oob component. Ensure the TCP component promotes messages for other procs to the OOB base so that other components have a chance to send the relay. Seems to be passing MTT, so let's see how it works for others. This commit was SVN r32650.	2014-08-30 19:33:46 +00:00
Ralph Castain	a2085a5916	Fix the PSM transport key generator to match prior releases This commit was SVN r32649.	2014-08-30 00:48:25 +00:00
Ralph Castain	8f1b9b463e	Fix shared memory operations - need to pass the local topology and cpusets of all local peers so we can properly compute relative locality for them. Also need to set default locality to "on node" in case where cpusets are not passed because procs are not bound. This commit was SVN r32577.	2014-08-22 05:17:51 +00:00
Ralph Castain	c6f78d6e54	The PMI ess component now gets used for more than direct launch, so only set standalone_operation flag if no daemon uri is available so we aggregate show_help messages This commit was SVN r32574.	2014-08-22 03:00:56 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Gilles Gouaillardet	a873f45a90	Fix r32460 race condition resolution when procs call MPI_Abort. do not invoke orte_session_dir_finalize(...) so orte_ess_base_app_abort(...) can successfully createi <orte_process_info.proc_session_dir>/aborted cmr=v1.8.2:reviewer=rhc This commit was SVN r32498. The following SVN revision numbers were found above: r32460 --> open-mpi/ompi@abedb97be4	2014-08-11 05:50:32 +00:00
Gilles Gouaillardet	d139f75db4	check-help-strings cleanup This commit was SVN r32492.	2014-08-11 03:20:37 +00:00
Ralph Castain	d29a5ab69d	Okay, now handle the non-MPI apps This commit was SVN r32399.	2014-08-01 14:49:25 +00:00
Ralph Castain	daeb9b6c4f	Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain. Thanks to Gilles for pointing out some of the discrepancies. This commit was SVN r32398.	2014-08-01 14:44:11 +00:00
Ralph Castain	8cfadd1842	Per Paul Hargrove, add missing include to fix build under FreeBSD RM-approved cmr=v1.8.2:reviewer=ompi-gk1.8 This commit was SVN r32397.	2014-08-01 13:37:41 +00:00
Ralph Castain	6c5e592785	Revert r32222, r32210, and r32203 as they created a problem when daemon collectives did not involve app procs on every node. Instead, modify the ompi/mca/rte/orte/rte_orte.h to add a new function that allows apps to request new daemon collective ids for use in barrier and modex operations. This will only appear in ORTE-based installations, but it is only being used by a couple of researchers at the moment. Update the orte/test/mpi/coll_test.c test to show the revised example. This commit was SVN r32234. The following SVN revision numbers were found above: r32203 --> open-mpi/ompi@a523dba41d r32210 --> open-mpi/ompi@2ce11ed5c4 r32222 --> open-mpi/ompi@d55f16db50	2014-07-15 03:48:00 +00:00
Ralph Castain	1feaffbb15	Get the blasted singleton comm_spawn working again. There remain problems with the Slurm interaction in this use-case as the PMI components (if configured to build) try to run even when a Slurm allocation hasn't been made, but I leave that to someone else to resolve. I did, however, tell the Slurm ess to quit interfering with applications launched in this use-case by ORTE daemons, so things do work when inside a Slurm allocation. Also discovered that the rsh launcher is not picking up --enable-orterun-prefix-by-default when invoked during singleton comm_spawn, but I was unable to see why that was happening and ran out of time. cmr=v1.8.2:reviewer=rhc This commit was SVN r32229.	2014-07-13 14:47:22 +00:00
Ralph Castain	2ce11ed5c4	Fix one spot missed by recent commit This commit was SVN r32210.	2014-07-10 20:08:57 +00:00
Ralph Castain	a523dba41d	NOTE: this modifies the MPI-RTE interface We have been getting several requests for new collectives that need to be inserted in various places of the MPI layer, all in support of either checkpoint/restart or various research efforts. Until now, this would require that the collective id's be generated at launch. which required modification s to ORTE and other places. We chose not to make collectives reusable as the race conditions associated with resetting collective counters are daunti ng. This commit extends the collective system to allow self-generation of collective id's that the daemons need to support, thereby allowing developers to request any number of collectives for their work. There is one restriction: RTE collectives must occur at the process level - i.e., we don't curren tly have a way of tagging the collective to a specific thread. From the comment in the code: * In order to allow scalable * generation of collective id's, they are formed as: * * top 32-bits are the jobid of the procs involved in * the collective. For collectives across multiple jobs * (e.g., in a connect_accept), the daemon jobid will * be used as the id will be issued by mpirun. This * won't cause problems because daemons don't use the * collective_id * * bottom 32-bits are a rolling counter that recycles * when the max is hit. The daemon will cleanup each * collective upon completion, so this means a job can * never have more than 2*32 collectives going on at a time. If someone needs more than that - they've got * a problem. * * Note that this means (for now) that RTE-level collectives * cannot be done by individual threads - they must be * done at the overall process level. This is required as * there is no guaranteed ordering for the collective id's, * and all the participants must agree on the id of the * collective they are executing. So if thread A on one * process asks for a collective id before thread B does, * but B asks before A on another process, the collectives will * be mixed and not result in the expected behavior. We may * find a way to relax this requirement in the future by * adding a thread context id to the jobid field (maybe taking the * lower 16-bits of that field). This commit includes a test program (orte/test/mpi/coll_test.c) that cycles 100 times across barrier and modex collectives. This commit was SVN r32203.	2014-07-10 18:53:12 +00:00
Adrian Reber	4b25e92194	get the FT code to compile again by adding/removing #includes This commit was SVN r32086.	2014-06-25 18:42:17 +00:00
Ralph Castain	b618b36a2f	Fix potential issue if opal_hwloc_topology is NULL cmr=v1.8.2:reviewer=jsquyres This commit was SVN r32050.	2014-06-19 18:52:41 +00:00
Ralph Castain	42bf7466fc	This isn't as big a change as it appears - a change in one place caused a whole bunch of files to require updated #include's due to some arcane linkage. Rework the orte_wait code to reflect the introduction of the state machine. If we are in cleanup mode and just want to kill all our local children, then there is no reason to be polite about it as that introduces very long delays at scale. Just kill the procs and move on. Refs trac:4717 This commit was SVN r32019. The following Trac tickets were found above: Ticket 4717 --> https://svn.open-mpi.org/trac/ompi/ticket/4717	2014-06-17 17:57:51 +00:00
Ralph Castain	8736a1c138	Per RFC: http://www.open-mpi.org/community/lists/devel/2014/05/14822.php Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root). This commit was SVN r31916.	2014-06-01 16:14:10 +00:00
Ralph Castain	1107f9099e	Per the RFC issued here: http://www.open-mpi.org/community/lists/devel/2014/05/14827.php Refactor PMI support This commit was SVN r31907.	2014-06-01 04:28:17 +00:00
Nathan Hjelm	59d09ad9de	orte: fix several small memory leaks grpcomm: fix memory leaks We were leaking the caddy object used to pass data to the callback function. This commit fixes these leaks. oob,rml: fix memory leaks This commit fixes several leaks: - Both the oob/base and oob/tcp were leaking objects on their peer hash tables. Iterate on the hash tables and free any objects. - Leaked sent messages because of missing OBJ_RELEASE. I placed the release in ORTE_RML_SEND_COMPLETE to catch all the possible paths. ess/base: close the state framework cmr=v1.8.2:reviewer=rhc This commit was SVN r31776.	2014-05-15 15:06:27 +00:00
Ralph Castain	3a1c2fff3e	Correct a misplaced bracket - daemons shouldn't be doing app-related operations This may need a patch for 1.8.2, but we can try to directly apply it cmr=v1.8.2:reviewer=hjelmn This commit was SVN r31754.	2014-05-14 15:23:30 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Ralph Castain	11faab1091	The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees. This commit was SVN r31679.	2014-05-08 02:01:35 +00:00
Ralph Castain	a8e2d6c3a6	The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature: top_ompi_srcdir -> OMPI_TOP_SRCDIR top_ompi_builddir -> OMPI_TOP_BUILDDIR We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers. Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon. This commit was SVN r31678.	2014-05-07 21:48:53 +00:00
Ralph Castain	87d809eefe	Add a new "run-time controls" framework for setting controls on processes. Initially, just move the process binding code there under a new "hwloc" component. Additional components to support cgroups, power settings, etc. to follow This commit was SVN r31633.	2014-05-05 19:22:06 +00:00
Ralph Castain	fae39a658d	Add third flag for open when using O_CREAT. Thanks to "robi" for reporting it and providing a patch. Fixes trac:4596 Reviewed by rhc, RM-approved cmr=v1.8.2:reviewer=ompi-gk1.8 This commit was SVN r31626. The following Trac tickets were found above: Ticket 4596 --> https://svn.open-mpi.org/trac/ompi/ticket/4596	2014-05-02 21:58:38 +00:00
Ralph Castain	ccd33a17b8	Since we cannot block when calling abort, and we want to ensure any "show_help" message at least has a chance to get out before we exit, introduce a slight delay into the abort procedure. Refs trac:4576 This commit was SVN r31601. The following Trac tickets were found above: Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576	2014-05-02 10:46:25 +00:00
Ralph Castain	c1383ca1f3	Protect against NULL cpuset when not bound This commit was SVN r31600.	2014-05-02 10:45:11 +00:00
Ralph Castain	d04a102ab8	Silence warnings This commit was SVN r31573.	2014-04-30 20:55:46 +00:00
Ralph Castain	c4c9bc1573	As per the RFC: http://www.open-mpi.org/community/lists/devel/2014/04/14496.php Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM This commit was SVN r31557.	2014-04-29 21:49:23 +00:00
Ralph Castain	e05b88fd18	Take another stab at resolving the "called-abort" requirement without getting stuck. Return to "drop a turd" mode, perhaps with a little more intelligence behind it. Don't worry about catching it if session dirs weren't created cmr=v1.8.2:reviewer=jsquyres:subject=cleanup MPI_Abort hangs This commit was SVN r31543.	2014-04-29 17:29:46 +00:00
Jeff Squyres	d8715f1e3a	Close 3 more fd's that were leaking into child processes. Child processes now look clean; I can't find any more fd's that are leaking from the parent to children. Refs trac:4550 This commit was SVN r31515. The following Trac tickets were found above: Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550	2014-04-24 15:36:24 +00:00
Ralph Castain	8594f5d738	Correctly set a non-zero exit status when mpirun is terminated by signal Fixes trac:4537 cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31423. The following Trac tickets were found above: Ticket 4537 --> https://svn.open-mpi.org/trac/ompi/ticket/4537	2014-04-18 16:39:08 +00:00
Ralph Castain	8d72633acf	Ensure that the session directory fields of orte_process_info have been initialized prior to cleaning up those directories as part of the initialization process that deals with stale session directory trees. Fixes trac:4534 cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31421. The following Trac tickets were found above: Ticket 4534 --> https://svn.open-mpi.org/trac/ompi/ticket/4534	2014-04-18 14:25:48 +00:00
Ralph Castain	a368e84e70	Per the RFC, remove the sensor framework from the ORTE code area, relocating it offsite to the ORCM code area. Also update some ignores to ensure we don't pickup crosstalk in components This commit was SVN r31403.	2014-04-15 21:48:24 +00:00
Ralph Castain	554da83865	Set the locality for remote procs even after a comm_spawn. Ensure we store our own local cpuset upon launch so it will be shared during comm_join. This provides full locality - i.e., not just node-level, but all the way down to whatever common binding level exists between the procs. cmr=v1.7.5:reviewer=jsquyres This commit was SVN r31106.	2014-03-18 14:51:07 +00:00
Ralph Castain	fbc5e3b773	Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails. cmr=v1.7.5:reviewer=jsquyres This commit was SVN r31068.	2014-03-14 15:32:30 +00:00
Ralph Castain	f56f37d364	Shifting to an event-driven RTE raises some interesting issues during shutdown. We want the last messages to get thru, but also need to correctly shutdown the virtual machine. This requires a delicate balancing act across event priorities, and the need to check for termination conditions in places where related events get processed. Change the priority of comm_failure and job_termination events to ensure we process final messages prior to terminating. Check for termination conditions when processing proc termination events as we may order proc termination when the daemon gets an exit command, but we can't see the proc actually terminate until we get out of that message event. Jeff: probably easiest to review this by testing. I tested it under both Slurm and rsh on v1.7.5 as well as trunk cmr=v1.7.5:reviewer=jsquyres:subject=resolve event priorities during VM shutdown This commit was SVN r31042.	2014-03-12 16:49:58 +00:00
Ralph Castain	103a5c6df1	Output the bindings if ess verbosity is high enough Refs trac:4356 This commit was SVN r30982. The following Trac tickets were found above: Ticket 4356 --> https://svn.open-mpi.org/trac/ompi/ticket/4356	2014-03-11 01:21:14 +00:00
Ralph Castain	081669b440	When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings This commit was SVN r30968.	2014-03-10 15:53:07 +00:00
Adrian Reber	f17ec1ab10	ESS/BASE: orte-restart needs sstore Running orte-restart requires an initialized sstore. This opens the sstore component for FT builds just like the snapc component. This commit was SVN r30796.	2014-02-21 21:23:26 +00:00
Ralph Castain	63803f5e61	Fix the leader data for PMI direct-launch as well This commit was SVN r30778.	2014-02-20 01:41:19 +00:00
Ralph Castain	262c927778	Define a new key and store the process name of the local_rank=0 process on each node so that the MPI layer can retrieve it as desired. This commit was SVN r30759.	2014-02-18 00:32:58 +00:00
Ralph Castain	c3df744a3b	Shift the orte_db_localrank key to the opal level. Add the job and proc-level session directory names to the database using opal_db keys. This commit was SVN r30746.	2014-02-17 01:40:56 +00:00
Ralph Castain	0ee38353ba	In case there are stale session directories around, do a purge of the relevant session directory tree when an orted, HNP, or singleton start. This won't help in the case of direct-launched apps, but it's the best we can do. cmr=v1.7.5:reviewer=jsquyres:subject=purge stale session dirs at startup This commit was SVN r30642.	2014-02-09 02:10:31 +00:00
Ralph Castain	1d8c061687	Fix a race condition that could result in assert failures during finalize. Ensure we shutdown the orte progress thread prior to finalizing the rml/oob frameworks so that no async operations are executing during destruct of the base-level lists and objects. cmr=v1.7.5:reviewer=jsquyres:subject=fix race condition in finalize This commit was SVN r30641.	2014-02-08 22:04:19 +00:00

1 2 3 4 5 ...

438 Коммитов