Samuel Gutierrez
bb791eaa23
change opal_output_verbose level to be consistent with shmem base.
...
This commit was SVN r25036.
2011-08-09 21:34:12 +00:00
Nathan Hjelm
aa3d302a05
use persistent rml_recv in iof
...
This commit was SVN r25035.
2011-08-09 21:30:12 +00:00
Samuel Gutierrez
b144c8c343
silence warning in shmem posix run-time test when err is not equal to EEXIST.
...
This commit was SVN r25034.
2011-08-09 21:13:28 +00:00
Ralph Castain
f1951e7ccd
If we are abnormally terminating, then don't wait for orteds to report back. Send them a "halt_vm" command, which instructs them to kill their local procs and immediately terminate, doing their best to cleanup on the way out.
...
Also do a little cleanup on debug output in rshbase.
This commit was SVN r25033.
2011-08-09 17:42:19 +00:00
Jeff Squyres
f96db45c17
Re-word one of the bullets.
...
This commit was SVN r25029.
2011-08-09 14:54:47 +00:00
Jeff Squyres
ecf8c805e6
Update NEWS for v1.5.4.
...
This commit was SVN r25028.
2011-08-09 13:28:12 +00:00
Mike Dubman
a751cd93d3
improve debug macro availability
...
This commit was SVN r25022.
2011-08-09 10:54:08 +00:00
Mike Dubman
bfd75de6f9
fix selection logic: if no suitable device found - disqulaify mxm w/o complains.
...
This commit was SVN r25021.
2011-08-09 07:09:37 +00:00
Wesley Bland
67feeb6aca
Move the errmgr code back. This shouldn't cause the svn problems that I
...
apparently caused last time. Sorry about that. This one will just be a big
changelog.
This commit was SVN r25016.
2011-08-08 16:01:08 +00:00
Wesley Bland
09274cd047
Make sure that the epoch is initialized everywhere so we don't get weird output
...
during valgrind. This shouldn't have caused any problems with any actual
execution. Just extra warnings in valgrind.
This commit was SVN r25015.
2011-08-08 15:11:55 +00:00
Matthias Jurenz
3a6e9b19ee
Fixed several Coverity warnings
...
This commit was SVN r25014.
2011-08-08 12:53:58 +00:00
Ralph Castain
8014e3429e
Don't double-count procs as they are launched
...
This commit was SVN r25011.
2011-08-08 06:05:23 +00:00
Ralph Castain
7b9f958dcf
Add some missing error strings. Update test to show silent errors
...
This commit was SVN r25010.
2011-08-08 04:21:02 +00:00
Ralph Castain
da9bbf68ec
Fix the output of error strings. Every convertor is returning OPAL_SUCCESS, so you have to check each convertor to find which one this error belongs to, and then run ONLY that convertor.
...
This commit was SVN r25009.
2011-08-08 04:10:40 +00:00
Ralph Castain
4083dc617f
Fix computation of number of required files and file descriptors - it only depends on the total number of local procs, not on the number of procs in the entire job!
...
This commit was SVN r25008.
2011-08-08 04:09:40 +00:00
Ralph Castain
590ac70e88
Add a simple test program for error string output
...
This commit was SVN r25007.
2011-08-07 21:32:25 +00:00
Ralph Castain
8b3c562b84
Adjust verbosity levels to make it easier to debug at scale
...
This commit was SVN r25006.
2011-08-07 21:14:21 +00:00
Mike Dubman
1d3f5e1314
better mxm selection mechanism, some refactoring
...
This commit was SVN r25005.
2011-08-07 12:06:49 +00:00
Ralph Castain
2418831bea
Pass the nodelist to the aprun command even when using all nodes
...
This commit was SVN r25004.
2011-08-06 04:19:41 +00:00
Ralph Castain
bd8e43a2de
Correct debug output so it doesn't falsely report the module
...
This commit was SVN r25003.
2011-08-05 20:30:34 +00:00
Ralph Castain
d603c79ab4
Fix the FAILED_TO_START scenario so orted doesn't segfault
...
This commit was SVN r25002.
2011-08-05 20:29:50 +00:00
Ralph Castain
c86bfb4e90
Need to copy the string
...
This commit was SVN r25001.
2011-08-05 19:03:28 +00:00
Ralph Castain
7b307d5bf0
Cleanup handling of all-numerical node names
...
This commit was SVN r25000.
2011-08-05 14:59:14 +00:00
Ralph Castain
157bad5435
If we can't compress the name, that's fine - but still have to move to next posn
...
This commit was SVN r24999.
2011-08-05 14:43:36 +00:00
Ralph Castain
3199663613
Correctly handle the case of mixes of character-based names and all-number names
...
This commit was SVN r24998.
2011-08-05 14:37:36 +00:00
Matthias Jurenz
1b402ecb1a
Changes to OTF:
...
- always check the result of OTF_WStream_get*Buffer since it might be NULL in case OTF_File_open fails
Changes to VT:
- CUDA Tracing:
- fixed configure stack for filtered kernels
- fixed buffer size for CUPTI tracing
- replaced error message with warning to continue tracing, even if CUDA error occured (VTCUDAsynchronizeEvt)
- vtunify:
- enlarged minimum message size for transfering local definitions to rank 0
- use binary search for searching already created global definitions
- use binary search for searching already created global marker definitions
- use LargeVectorC instead of std::vector for pre-allocating elements
- vtwrapper:
- added options '-vt:CC' and '-vt:c++' which are synonyms for '-vt:cxx'
This commit was SVN r24997.
2011-08-05 12:46:26 +00:00
Jeff Squyres
d1a0c4428f
Add svn:ignore
...
This commit was SVN r24994.
2011-08-05 12:22:27 +00:00
Ralph Castain
066022126e
Sort the nodes to be in numerically increasing order so the regex has a chance of working right.
...
This commit was SVN r24993.
2011-08-05 03:37:13 +00:00
Ralph Castain
5a634caad9
Cleanly handle the case where the node "name" is just a number, and avoid the N-N output when the number is not part of a sequence.
...
This commit was SVN r24992.
2011-08-05 03:36:30 +00:00
Yevgeny Kliteynik
7068dc64eb
Dynamic SL rework:
...
- Added dynamic SL support to xoob
- Fixed seg fault in finalization
- All the code has been moved to separate files: connect/btl_openib_connect_sl.{c,h}
- The new files compilation is conditionalized
This commit was SVN r24991.
2011-08-04 20:26:08 +00:00
Jeff Squyres
31311c981b
Add note about C++ bindings cosmetic fix.
...
This commit was SVN r24990.
2011-08-04 15:44:04 +00:00
Jeff Squyres
d28564aa26
Per http://www.open-mpi.org/community/lists/devel/2011/08/9606.php ,
...
comment out some unused parameter names. I didn't use
__opal_attribute_unused__ because comm_inln.h is (eventually) included
by <mpi.h>, and therefore we don't have all the OPAL config stuff
available. And it didn't seem worth it to add the optional
attribute_unused stuff to the top of mpi.h.
Thanks to Júlio Hoffimann for reporting the issue.
This commit was SVN r24989.
2011-08-04 15:39:12 +00:00
Jeff Squyres
ba432393d4
Remove some really old (internal) kruft that never ended up getting
...
used.
This commit was SVN r24988.
2011-08-04 15:24:37 +00:00
Rolf vandeVaart
3d3b3d4dad
Add support for CUDA registering sm and openib buffers. Feature is disabled by default.
...
This commit was SVN r24987.
2011-08-04 10:15:45 +00:00
Mike Dubman
9928c33edd
better description of MXM MTL
...
This commit was SVN r24986.
2011-08-04 07:57:46 +00:00
Jeff Squyres
288915ac6a
Add svn:ignore
...
This commit was SVN r24985.
2011-08-03 23:38:12 +00:00
Jeff Squyres
294e1f50cd
Remove compiler warning about nested comment
...
This commit was SVN r24984.
2011-08-03 18:30:56 +00:00
Jeff Squyres
50ab8d893c
Recent (as of 3 Aug 2011) versions of LWP in Macports seem to have
...
broken SSL certificate verification. The IU CA is in my Mac system
keychain (and has been there for quite a long time), but after a
recent ports update, LWP fails the SSL certificate verification.
Fine. So we'll just turn it off, per
http://search.cpan.org/~gaas/libwww-perl-6.02/lib/LWP/UserAgent.pm .
This commit was SVN r24983.
2011-08-03 13:50:23 +00:00
Jeff Squyres
ecc7937584
Format the README a bit and shape up some of the text about MXM.
...
Still need a bit more, though.
This commit was SVN r24982.
2011-08-03 13:22:56 +00:00
Jeff Squyres
cebd1837e5
Add special token to gkcommit commit messages so that the SVN
...
pre-commit hook doesn't try to re-close tickets that are referred to
in the original SVN commit messages.
This commit was SVN r24981.
2011-08-03 13:02:45 +00:00
Mike Dubman
7b18ab2fa9
remove unused includes
...
This commit was SVN r24980.
2011-08-03 07:07:29 +00:00
Jeff Squyres
f539b20a8f
Patch from ARM for assembly:
...
http://www.open-mpi.org/community/lists/devel/2011/08/9586.php
This commit was SVN r24979.
2011-08-02 19:15:24 +00:00
Mike Dubman
45ea375531
code and readme updates, some refactoring
...
This commit was SVN r24977.
2011-08-02 14:30:11 +00:00
Jeff Squyres
8f4ac54336
Fixes trac:2838: add a warning message and disqualify the TCP BTL if both
...
btl_tcp_if_include and btl_tcp_if_exclude are specified.
This commit was SVN r24976.
The following Trac tickets were found above:
Ticket 2838 --> https://svn.open-mpi.org/trac/ompi/ticket/2838
2011-08-01 23:30:33 +00:00
Wesley Bland
87a96da99c
Should fix some of the shutdown woes of the errmgr.
...
Correctly checks that the orted's job is completed.
Correctly tests to make sure that there is shutdown going on (doesn't rely on orte_orteds_term_ordered).
Adds a patch from Ralph to correctdly check the status of processes.
This commit was SVN r24962.
2011-08-01 14:00:41 +00:00
Ralph Castain
42b125ef35
Move the debug so it more accurately reports
...
This commit was SVN r24961.
2011-07-29 20:48:46 +00:00
Ralph Castain
70bca4691f
Add a new "sensor" module that supports fault tolerance tests - randomly kills local procs and/or the daemon itself
...
This commit was SVN r24960.
2011-07-29 20:48:22 +00:00
Ralph Castain
e88a6c93da
Set properties
...
This commit was SVN r24959.
2011-07-28 22:03:31 +00:00
Wesley Bland
5fde3e0e00
Move the resilient orte errmgr code into a seperate errmgr for now while it's
...
still unstable. Reverted errmgr modules back to the original errmgr (with the
updates since the resilient code was brought into the trunk).
This commit was SVN r24958.
2011-07-28 21:24:34 +00:00
Ralph Castain
6c879f87fb
Add a new param "orte_remote_tmpdir_base" for those situations where the compute nodes require a different session directory head than the head node.
...
This commit was SVN r24956.
2011-07-27 19:37:17 +00:00