Josh Hursey
e4f2d03d28
ErrMgr Framework redesign to better support fault tolerance development activities.
...
Explained in more detail in the following RFC:
http://www.open-mpi.org/community/lists/devel/2010/03/7589.php
This commit was SVN r22872.
2010-03-23 21:28:02 +00:00
Ralph Castain
0b9552cd4e
Expand the ESS framework's API to include a new function "query_sys_info" that allows the caller to retrieve key-value pairs of info on the local system capabilities (e.g., cpu type/model). Have each daemon and the HNP "sense" that information and provide it to their local procs to avoid having every proc querying the system directly.
...
This commit was SVN r22870.
2010-03-23 20:47:41 +00:00
Ralph Castain
9a5fdbb622
Continue development of reliable multicast
...
This commit was SVN r22616.
2010-02-14 19:20:56 +00:00
Ralph Castain
09763ec711
Since we modified ORTE to declare that any process that terminates after calling "init" while at least one other process has not yet called "init" is an error, we have to ensure that non-MPI ORTE apps (i.e., apps that call orte_init but not mpi_init) include a barrier in orte_init. Otherwise, fast ORTE apps almost always wind up triggering the "abnormal termination" condition.
...
The barrier is protected with a test to ensure that MPI apps don't execute it and wind up doing two barriers during their init.
This commit was SVN r22378.
2010-01-07 06:58:01 +00:00
Ralph Castain
ef1bfaa823
Add the ability to track how many times a process has been restarted, and to communicate that value to a process when it is restarted in case it needs to take action when it is restarted as opposed to being started for the first time.
...
This commit was SVN r22377.
2010-01-07 01:19:44 +00:00
Ralph Castain
06d1f2cfe2
Add some new tests to the ORTE collection
...
This commit was SVN r22328.
2009-12-17 19:30:57 +00:00
Ralph Castain
4026a9c873
Update all the tests to the new orte_init API
...
This commit was SVN r22263.
2009-12-04 04:31:06 +00:00
Ralph Castain
4a82dd9a45
Add message sequence numbers to multicast messages, tracked by channel
...
This commit was SVN r22262.
2009-12-04 04:17:44 +00:00
Ralph Castain
ae3e9f2aee
Update the spin.c test
...
This commit was SVN r22259.
2009-12-03 04:46:31 +00:00
Ralph Castain
93ebed48b1
Update the multicast test. Some cleanups to the basic rmcast module
...
This commit was SVN r22257.
2009-12-03 04:30:58 +00:00
Ralph Castain
a0d5c80ce0
Add a new framework for discovering local resource information such as cpu type/model, #cpus, available physical memory, etc. Two initial components (darwin and linux) are provided. This is needed to support bootstrap operations where daemons are started at node boot, and applications where initial knowledge of cpu identification is needed to guide framework component selection.
...
Add orte configuration option to control the use of the framework in the system. Although the code will build, it will not be active unless configured with --enable-bootstrap.
If bootstrap is enabled and the new opal_sysinfo framework can successfully determine the cpu model, pass that info to the application as an MCA param to support some work at Sun.
Also, have daemons report back the resources they find to guide process mapping in bootstrap operations (i.e., where the daemon starts at node boot as opposed to being launched at application start).
Adjust some platform files to enable these capabilities.
This commit was SVN r22244.
2009-11-30 23:11:25 +00:00
Ralph Castain
92733b13d9
Add a couple of new tests to the orte system.
...
Modify the job_complete check so we don't kill jobs when a single proc was terminated by ORTE command via plm.terminate_procs
Still dies gracefully with a ctrl-c, and behaves as before when using plm.terminate_job
This commit was SVN r22227.
2009-11-20 01:47:49 +00:00
Ralph Castain
840766a894
Update the rmcast APIs to include tag params and reorder them to look like their rml cousins
...
This commit was SVN r22218.
2009-11-17 15:58:59 +00:00
Ralph Castain
a2f3a47b92
Update the orte_mcast test
...
This commit was SVN r22214.
2009-11-11 22:11:19 +00:00
Ralph Castain
7afd65d631
Add a couple of test programs
...
This commit was SVN r22137.
2009-10-24 01:00:38 +00:00
Ralph Castain
b7a0125bb7
Add a test for the new opal if.c functions. Modify the multicast test
...
This commit was SVN r22081.
2009-10-09 15:25:18 +00:00
Ralph Castain
26bb6e8f79
Add a couple of non-orte multicast tests
...
This commit was SVN r22001.
2009-09-23 05:24:22 +00:00
Ralph Castain
c3f9096fd9
Add a reliable multicast framework, with an initial basic module. This is configured out unless specifically requested via --enable-multicast.
...
This commit was SVN r21988.
2009-09-22 00:58:29 +00:00
Ralph Castain
82af6ee940
Update test
...
This commit was SVN r21987.
2009-09-22 00:55:02 +00:00
Ralph Castain
0e528e994f
Revert last commit - went to wrong repo!
...
Didn't we just have that happen the other day too? :-)
This commit was SVN r21878.
2009-08-25 13:06:14 +00:00
Ralph Castain
15d12b240b
Sync to r21876
...
This commit was SVN r21877.
The following SVN revision numbers were found above:
r21876 --> open-mpi/ompi@ef970293f0
2009-08-25 13:04:12 +00:00
Ralph Castain
511fe5da8b
Minor cleanup - get reported iters right in test
...
This commit was SVN r21819.
2009-08-14 03:33:59 +00:00
Ralph Castain
9da6b46e7d
Add several options to the sendrecv_blaster to make it more powerful
...
This commit was SVN r21817.
2009-08-14 03:12:43 +00:00
Ralph Castain
007cbe74f4
Include the sendrecv blaster in the tarball
...
This commit was SVN r21790.
2009-08-11 02:42:47 +00:00
Rainer Keller
76469ea64a
- Change the property of a few files, that obviously
...
don't need to be svn:executable...
This commit was SVN r21786.
2009-08-11 01:40:00 +00:00
Ralph Castain
c66a5a9504
Add another test that just blasts the system with MPI_Sendrecv to myself commands of varying sizes
...
This commit was SVN r21748.
2009-07-31 14:57:03 +00:00
Ralph Castain
ef20e778b3
Ensure that output ends on an appropriate suffix tag when --tag-output or --xml are selected.
...
When we read the input buffer, we don't always get a complete printf output - we sometimes end mid stream. We still need to add the suffix and a <CR> to keep the output working right.
This commit was SVN r21706.
2009-07-17 05:02:53 +00:00
Ralph Castain
bc0fe3c6da
Add some more tests for parallel IO that have caused problems in the past.
...
Add a README that explains how to run the ziatest for launch timing
This commit was SVN r21576.
2009-07-01 14:47:14 +00:00
Ralph Castain
2fbdea0273
Add a test for loop over bcast
...
This commit was SVN r21560.
2009-06-29 17:06:19 +00:00
Ralph Castain
70b8c89b44
Fix slave spawn, which was hanging because the local daemon never saw the slave job report - it doesn't do it in the normal way, and so the slave launch system itself has to "fake it".
...
Also complete implementation to printout app_context objects so we see all the fields.
This commit was SVN r21408.
2009-06-10 19:01:08 +00:00
Ralph Castain
86d55d7ebf
Fix tight loops over comm_spawn by checking to see if the system has enough child procs and file descriptors available before attempting to launch. If not, introduce a 1sec delay and then test again. This provides a chance for the orted to complete processing of proc terminations from other children, hopefully creating room for the new proc(s).
...
Update the loop_spawn test to remove a sleep so that it runs at max speed, letting the new code catch when we overrun ourselves and wait for room to be cleared for the next comm_spawn.
This commit was SVN r21390.
2009-06-08 18:28:26 +00:00
Greg Koenig
60485ff95f
This is a very large change to rename several #define values from
...
OMPI_* to OPAL_*. This allows opal layer to be used more independent
from the whole of ompi.
NOTE: 9 "svn mv" operations immediately follow this commit.
This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Ralph Castain
4be24521aa
Modify the orte_process_info structure to handle a broader range of process types by replacing the individual booleans with a 32-bit bitmap. Use a set of #define's to define the individual bits, and a set of matching macros to test for them. Update the orte code base to use the macros instead of the booleans.
...
Minor mod to the ompi layer to use the new #define's - just one-line name replacements.
This commit was SVN r21144.
2009-05-04 11:07:40 +00:00
Ralph Castain
7b420e32b6
Add some missing tests
...
This commit was SVN r21140.
2009-05-01 18:35:22 +00:00
Ralph Castain
dfb2146430
Perform the ziatest as a C program instead of a script - less trouble that way.
...
This commit was SVN r21132.
2009-04-30 18:43:26 +00:00
Ralph Castain
ce206df568
Fix the danged ziatest - thx to Jeff, the mighty perl guru!
...
This commit was SVN r21116.
2009-04-29 17:58:35 +00:00
Rainer Keller
221fb9dbca
... Delayed due to notifier commits earlier this day ...
...
- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Ralph Castain
f3cfe32b5d
Update the slave launch and cleanup procedures. Track what files have been moved to the slave node to avoid attempting to copy them multiple times on top of each other. Cleanup any pre-positioned files, kill any lingering apps, and cleanup the session directory area upon termination of the daemon.
...
This commit was SVN r21094.
2009-04-29 00:11:19 +00:00
Ralph Castain
76b6ae3b29
Fix the ziatest to report correct times
...
This commit was SVN r21059.
2009-04-23 01:12:56 +00:00
Ralph Castain
3c3c306ee4
Simplify test
...
This commit was SVN r21058.
2009-04-22 23:03:04 +00:00
Ralph Castain
1f7281957f
Need this one too...
...
This commit was SVN r21027.
2009-04-16 02:23:57 +00:00
Ralph Castain
9c2f17eb01
Cleanup the nidmap lookup functions and add some comments explaining how we handle the nid, job, and pmap arrays. This fixes a problem we have less-than-full participation in a comm_spawn, causing holes to exist in the pmap array.
...
Update the slave spawn tests to properly indicate participation as being solely MPI_COMM_SELF.
This commit was SVN r20961.
2009-04-09 02:48:33 +00:00
Ralph Castain
4af623076d
Add a test for hanging in a loop over mpi_reduce
...
This commit was SVN r20798.
2009-03-17 13:57:23 +00:00
Rainer Keller
ec0ed48718
- Revert r20739
...
This commit was SVN r20742.
The following SVN revision numbers were found above:
r20739 --> open-mpi/ompi@781caee0b6
2009-03-05 21:56:03 +00:00
Rainer Keller
a94438343b
- Revert r20740
...
This commit was SVN r20741.
The following SVN revision numbers were found above:
r20740 --> open-mpi/ompi@2a70618a77
2009-03-05 21:50:47 +00:00
Rainer Keller
2a70618a77
- Second patch, as discussed in Louisville.
...
Replace short macros in orte/util/name_fns.h
to the actual fct. call.
- Compiles on linux/x86-64
This commit was SVN r20740.
2009-03-05 21:14:18 +00:00
Rainer Keller
781caee0b6
- First of two or three patches, in orte/util/proc_info.h:
...
Adapt orte_process_info to orte_proc_info, and
change orte_proc_info() to orte_proc_info_init().
- Compiled on linux-x86-64
- Discussed with Ralph
This commit was SVN r20739.
2009-03-05 20:36:44 +00:00
Ralph Castain
47cfccbb49
Update a couple of tests
...
This commit was SVN r20668.
2009-03-01 15:32:32 +00:00
Rainer Keller
4c0e8e1e69
- Header orte/mca/oob/base/base.h is probably the wrong one to include
...
anyhow -- if oob functionality is neededm then orte/mca/oob/oob.h
Nevertheless compiles fine with -Wimplicit-function-declaration
This commit was SVN r20641.
2009-02-26 04:20:03 +00:00
Rainer Keller
04567d3af0
- Header orte/mca/errmgr/errmgr.h is not needed.
...
Once again compiles fine with -Wimplicit-function-declaration
This commit was SVN r20640.
2009-02-26 04:05:30 +00:00