Wesley Bland
87a96da99c
Should fix some of the shutdown woes of the errmgr.
...
Correctly checks that the orted's job is completed.
Correctly tests to make sure that there is shutdown going on (doesn't rely on orte_orteds_term_ordered).
Adds a patch from Ralph to correctdly check the status of processes.
This commit was SVN r24962.
2011-08-01 14:00:41 +00:00
Ralph Castain
42b125ef35
Move the debug so it more accurately reports
...
This commit was SVN r24961.
2011-07-29 20:48:46 +00:00
Ralph Castain
70bca4691f
Add a new "sensor" module that supports fault tolerance tests - randomly kills local procs and/or the daemon itself
...
This commit was SVN r24960.
2011-07-29 20:48:22 +00:00
Ralph Castain
e88a6c93da
Set properties
...
This commit was SVN r24959.
2011-07-28 22:03:31 +00:00
Wesley Bland
5fde3e0e00
Move the resilient orte errmgr code into a seperate errmgr for now while it's
...
still unstable. Reverted errmgr modules back to the original errmgr (with the
updates since the resilient code was brought into the trunk).
This commit was SVN r24958.
2011-07-28 21:24:34 +00:00
Ralph Castain
6c879f87fb
Add a new param "orte_remote_tmpdir_base" for those situations where the compute nodes require a different session directory head than the head node.
...
This commit was SVN r24956.
2011-07-27 19:37:17 +00:00
Ralph Castain
decab98fb2
Do a little better job of catching up on missed mcast messages, and provide a way out of scenarios where catch-up is impossible.
...
This commit was SVN r24955.
2011-07-27 14:58:30 +00:00
Ralph Castain
c3bc33b3fb
Don't be so restrictive - accept "slots" as well as "slot" in rank file
...
This commit was SVN r24954.
2011-07-27 00:45:30 +00:00
Wesley Bland
b972fd84e1
No longer sends extra FAILED_NOTIFICATION messages in the non-failure case.
...
Should reduce finalize complexity and avoid a race condition that has been
detected by a few users.
This commit was SVN r24952.
2011-07-26 20:47:44 +00:00
Matthias Jurenz
4ca70e5c91
Changes to OTF:
...
- improved zlib compression
- otfprofile-mpi:
- fixed progress
Changes to VT:
- fixed C++ linker issue for manual instrumentation of multiple files
- fixed CUDA kernel launch configuration
- process and thread buffer size can be explicitly specified by the user via the environment variables VT_BUFFER_SIZE and VT_THREAD_BUFFER_SIZE
- fixed CUDA buffer management
- vtfilter:
- fixed progress
- vtwrapper:
- link CUPTI library, if available
- vtsetup:
- removed fixed path to *.dtd file in vtsetup-data.xml[.in] (fixes 'java.net.MalformedURLException')
This commit was SVN r24950.
2011-07-26 12:47:05 +00:00
Yevgeny Kliteynik
c1ab24c687
openib: added Mellanox ConnectX3 device ID to the device parameters ini file
...
This commit was SVN r24947.
2011-07-26 12:06:43 +00:00
Mike Dubman
aefffa073d
initial implementation of MXM MTL layer
...
This commit was SVN r24946.
2011-07-26 04:36:21 +00:00
Ralph Castain
715f871605
Ignore the daemon job when reporting parseable output
...
This commit was SVN r24944.
2011-07-25 20:44:08 +00:00
Ralph Castain
db193555c2
Use non-blocking sends for recovering from lost multicast messages
...
This commit was SVN r24943.
2011-07-25 18:49:47 +00:00
Samuel Gutierrez
adde221413
use memcpy in ds_copy.
...
This commit was SVN r24942.
2011-07-25 17:16:29 +00:00
Mike Dubman
96ef2fc0e4
fix handling datatypes which have a gap in the beginning
...
This commit was SVN r24936.
2011-07-25 06:30:09 +00:00
Ralph Castain
199804fc35
complete implementation of parseable output
...
This commit was SVN r24929.
2011-07-23 22:23:24 +00:00
Ralph Castain
ffe6f5f40e
Fix map pack/unpack so they match
...
This commit was SVN r24928.
2011-07-23 22:23:05 +00:00
Ralph Castain
00647fa342
Update orte-ps to add parseable output - not fully tested because I couldn't get other parts of the system to work.
...
This commit was SVN r24927.
2011-07-23 20:20:31 +00:00
Ralph Castain
869024f1c6
You have to initialize th daemon param -before- using it to get epoch!!
...
This commit was SVN r24926.
2011-07-23 20:19:43 +00:00
Ralph Castain
361bcef253
Close multicast before rml
...
This commit was SVN r24925.
2011-07-23 20:19:15 +00:00
Jeff Squyres
d6bc78920e
Add a few manual cleanups that were missed (i.e., this is the
...
''other'' direction, so to speak, compared to r24921).
This commit was SVN r24924.
The following SVN revision numbers were found above:
r24921 --> open-mpi/ompi@bd96d028de
2011-07-22 21:05:39 +00:00
Jeff Squyres
5fd57dad37
Add in missing ARM.asm file (this is in addition to r24875, which
...
included a missing ARM directory).
This commit was SVN r24923.
The following SVN revision numbers were found above:
r24875 --> open-mpi/ompi@ceabe91484
2011-07-22 20:04:50 +00:00
Ralph Castain
8a7f9f8997
Hide libevent symbols when internal thread support enabled
...
This commit was SVN r24922.
2011-07-22 19:49:47 +00:00
Jeff Squyres
bd96d028de
George identified some memory leaks and inconsistencies in the F77 API
...
when sizeof(int) != sizeof(MPI_Fint). This commit should fix those
problems.
This commit was SVN r24921.
2011-07-22 19:49:27 +00:00
Ralph Castain
3f0d13efe2
Fix libevent internal thread support
...
This commit was SVN r24920.
2011-07-22 19:18:49 +00:00
Jeff Squyres
352cd5bc62
Update svn:ignore
...
This commit was SVN r24917.
2011-07-22 13:56:31 +00:00
Jeff Squyres
d95f2361f8
Handle "svn st" output, even if it has a "+" in the middle of the line
...
This commit was SVN r24915.
2011-07-21 22:42:03 +00:00
Shiqing Fan
edaa7b96e4
This should not be commented out.
...
This commit was SVN r24914.
2011-07-21 12:56:18 +00:00
Shiqing Fan
cc4403a863
Remove two unused windows files.
...
This commit was SVN r24913.
2011-07-21 12:53:32 +00:00
Shiqing Fan
665d1284be
Fix a bug that memcpy'ing a wrong temp string.
...
This commit was SVN r24912.
2011-07-21 12:53:03 +00:00
Brian Barrett
3bd66a5932
* Remove unused Portals3.3 reference implementation support
...
This commit was SVN r24906.
2011-07-20 23:30:29 +00:00
Brian Barrett
cc660fa57a
Rather than looking for any path, look for any non-absolute path starting
...
in contrib/platform, in addition to cwd
This commit was SVN r24905.
2011-07-20 23:28:17 +00:00
Eugene Loh
921852e1e5
Clean up the computations of num_procs_alive. Do some code
...
refactoring to improve readability and to compute num_procs_alive
correctly and to remove the use of loop iteration variables for
two loops nested one inside another (causing MPI_Comm_spawn_multiple
to fail).
This commit was SVN r24903.
2011-07-14 20:10:48 +00:00
Ralph Castain
6201581544
Fix the symbol visibility issue for libevent by renaming all visible libevent symbols
...
This commit was SVN r24902.
2011-07-14 07:10:52 +00:00
Abhishek Kulkarni
b64ea09d72
Fix C/R-related error messages during initialization.
...
This commit was SVN r24901.
2011-07-13 23:34:34 +00:00
Yevgeny Kliteynik
78ea8bcea2
Always defining OMPI_ENABLE_DYNAMIC_SL, not only when the feature enabled.
...
Also, refactoring some code - make all the checks only when relevant.
This commit was SVN r24900.
2011-07-13 23:19:58 +00:00
Ralph Castain
1d65833980
Remove mcast from the odin debug build
...
This commit was SVN r24899.
2011-07-13 22:51:22 +00:00
Ralph Castain
8853e0e80a
Fix regular expression analyzer for slurmd - use a slurm-specific version
...
Fix multi-node routing for daemon startup when static ports are not set
This commit was SVN r24898.
2011-07-13 22:49:56 +00:00
Ralph Castain
8d1b31b887
Don't know how we got away with this for so long, but we really shouldn't be referencing pointer array objects directly.
...
Also, fix an error in mpirx debugger module - the pointer array object is the pointer to the object itself, not the object "super" like in an opal_list.
This commit was SVN r24894.
2011-07-13 20:11:14 +00:00
Terry Dontje
fbda6aaf89
Fixes trac:2532 issues with 32-bit binaries
...
This commit was SVN r24891.
The following Trac tickets were found above:
Ticket 2532 --> https://svn.open-mpi.org/trac/ompi/ticket/2532
2011-07-13 16:38:03 +00:00
Ralph Castain
1405bacd85
Ensure we dont segfault if we report an error
...
This commit was SVN r24890.
2011-07-13 15:00:22 +00:00
Jeff Squyres
3893a5a1de
Fix compile error introduced in r24888.
...
This commit was SVN r24889.
The following SVN revision numbers were found above:
r24888 --> open-mpi/ompi@e5253647ea
2011-07-13 14:18:00 +00:00
Shiqing Fan
e5253647ea
Fix a type cast.
...
This commit was SVN r24888.
2011-07-13 09:00:17 +00:00
Ralph Castain
5e99d45ae4
Remove unused variable
...
This commit was SVN r24887.
2011-07-13 03:42:20 +00:00
Ralph Castain
1ad110d2e9
After a nice, calm, rational discussion between Brian, Jeff, and myself, we decided to revert r24864 and r24862 to restore the reference counters in opal_init/finalize. The rationale was that we should instead change orte_init/finalize to also use reference counters to support multi-embedded libraries. Jeff and Brian will discuss proposing a similar change to mpi_init/finalize to the MPI Forum so that all three libraries will behave in similar manners.
...
It was agreed that opal_init_util had wound up being used in unintended ways, which raised the problem of getting reference counts to work right. However, fixing it would involve more pain than it was worth - and so long as the other layers are made to behave similarly, I have no preference either way.
Complete implementation will follow - for now, this just reverts the prior changes.
This commit was SVN r24886.
The following SVN revision numbers were found above:
r24862 --> open-mpi/ompi@aa92e0c4eb
r24864 --> open-mpi/ompi@a5062385c2
2011-07-12 17:07:41 +00:00
Nathan Hjelm
3f4e5d7dd6
add missing thread lock/unlock around condition_broadcast
...
This commit was SVN r24885.
2011-07-12 15:43:56 +00:00
Nathan Hjelm
c3ec2e2614
fix a potential race condition in rml
...
This commit was SVN r24884.
2011-07-12 15:43:12 +00:00
Nadia Derbey
0d0cead33a
Fix a hang in carto_base_select() if carto_module_init() fails
...
This commit was SVN r24876.
2011-07-12 05:47:28 +00:00
Jeff Squyres
ceabe91484
Yow; we forgot to include the ARM stuff in the tarball. :-(
...
This commit was SVN r24875.
2011-07-11 23:52:07 +00:00