1
1
Граф коммитов

15860 Коммитов

Автор SHA1 Сообщение Дата
Wesley Bland
67feeb6aca Move the errmgr code back. This shouldn't cause the svn problems that I
apparently caused last time. Sorry about that. This one will just be a big
changelog.  

This commit was SVN r25016.
2011-08-08 16:01:08 +00:00
Wesley Bland
09274cd047 Make sure that the epoch is initialized everywhere so we don't get weird output
during valgrind. This shouldn't have caused any problems with any actual
execution. Just extra warnings in valgrind.

This commit was SVN r25015.
2011-08-08 15:11:55 +00:00
Matthias Jurenz
3a6e9b19ee Fixed several Coverity warnings
This commit was SVN r25014.
2011-08-08 12:53:58 +00:00
Ralph Castain
8014e3429e Don't double-count procs as they are launched
This commit was SVN r25011.
2011-08-08 06:05:23 +00:00
Ralph Castain
7b9f958dcf Add some missing error strings. Update test to show silent errors
This commit was SVN r25010.
2011-08-08 04:21:02 +00:00
Ralph Castain
da9bbf68ec Fix the output of error strings. Every convertor is returning OPAL_SUCCESS, so you have to check each convertor to find which one this error belongs to, and then run ONLY that convertor.
This commit was SVN r25009.
2011-08-08 04:10:40 +00:00
Ralph Castain
4083dc617f Fix computation of number of required files and file descriptors - it only depends on the total number of local procs, not on the number of procs in the entire job!
This commit was SVN r25008.
2011-08-08 04:09:40 +00:00
Ralph Castain
590ac70e88 Add a simple test program for error string output
This commit was SVN r25007.
2011-08-07 21:32:25 +00:00
Ralph Castain
8b3c562b84 Adjust verbosity levels to make it easier to debug at scale
This commit was SVN r25006.
2011-08-07 21:14:21 +00:00
Mike Dubman
1d3f5e1314 better mxm selection mechanism, some refactoring
This commit was SVN r25005.
2011-08-07 12:06:49 +00:00
Ralph Castain
2418831bea Pass the nodelist to the aprun command even when using all nodes
This commit was SVN r25004.
2011-08-06 04:19:41 +00:00
Ralph Castain
bd8e43a2de Correct debug output so it doesn't falsely report the module
This commit was SVN r25003.
2011-08-05 20:30:34 +00:00
Ralph Castain
d603c79ab4 Fix the FAILED_TO_START scenario so orted doesn't segfault
This commit was SVN r25002.
2011-08-05 20:29:50 +00:00
Ralph Castain
c86bfb4e90 Need to copy the string
This commit was SVN r25001.
2011-08-05 19:03:28 +00:00
Ralph Castain
7b307d5bf0 Cleanup handling of all-numerical node names
This commit was SVN r25000.
2011-08-05 14:59:14 +00:00
Ralph Castain
157bad5435 If we can't compress the name, that's fine - but still have to move to next posn
This commit was SVN r24999.
2011-08-05 14:43:36 +00:00
Ralph Castain
3199663613 Correctly handle the case of mixes of character-based names and all-number names
This commit was SVN r24998.
2011-08-05 14:37:36 +00:00
Matthias Jurenz
1b402ecb1a Changes to OTF:
- always check the result of OTF_WStream_get*Buffer since it might be NULL in case OTF_File_open fails

Changes to VT:
	- CUDA Tracing:
		- fixed configure stack for filtered kernels
		- fixed buffer size for CUPTI tracing
		- replaced error message with warning to continue tracing, even if CUDA error occured (VTCUDAsynchronizeEvt)
	- vtunify:
		- enlarged minimum message size for transfering local definitions to rank 0
		- use binary search for searching already created global definitions
		- use binary search for searching already created global marker definitions
		- use LargeVectorC instead of std::vector for pre-allocating elements
	- vtwrapper:
		- added options '-vt:CC' and '-vt:c++' which are synonyms for '-vt:cxx'

This commit was SVN r24997.
2011-08-05 12:46:26 +00:00
Jeff Squyres
d1a0c4428f Add svn:ignore
This commit was SVN r24994.
2011-08-05 12:22:27 +00:00
Ralph Castain
066022126e Sort the nodes to be in numerically increasing order so the regex has a chance of working right.
This commit was SVN r24993.
2011-08-05 03:37:13 +00:00
Ralph Castain
5a634caad9 Cleanly handle the case where the node "name" is just a number, and avoid the N-N output when the number is not part of a sequence.
This commit was SVN r24992.
2011-08-05 03:36:30 +00:00
Yevgeny Kliteynik
7068dc64eb Dynamic SL rework:
- Added dynamic SL support to xoob
 - Fixed seg fault in finalization
 - All the code has been moved to separate files: connect/btl_openib_connect_sl.{c,h}
 - The new files compilation is conditionalized

This commit was SVN r24991.
2011-08-04 20:26:08 +00:00
Jeff Squyres
31311c981b Add note about C++ bindings cosmetic fix.
This commit was SVN r24990.
2011-08-04 15:44:04 +00:00
Jeff Squyres
d28564aa26 Per http://www.open-mpi.org/community/lists/devel/2011/08/9606.php,
comment out some unused parameter names.  I didn't use
__opal_attribute_unused__ because comm_inln.h is (eventually) included
by <mpi.h>, and therefore we don't have all the OPAL config stuff
available.  And it didn't seem worth it to add the optional
attribute_unused stuff to the top of mpi.h.

Thanks to Júlio Hoffimann for reporting the issue.

This commit was SVN r24989.
2011-08-04 15:39:12 +00:00
Jeff Squyres
ba432393d4 Remove some really old (internal) kruft that never ended up getting
used. 

This commit was SVN r24988.
2011-08-04 15:24:37 +00:00
Rolf vandeVaart
3d3b3d4dad Add support for CUDA registering sm and openib buffers. Feature is disabled by default.
This commit was SVN r24987.
2011-08-04 10:15:45 +00:00
Mike Dubman
9928c33edd better description of MXM MTL
This commit was SVN r24986.
2011-08-04 07:57:46 +00:00
Jeff Squyres
288915ac6a Add svn:ignore
This commit was SVN r24985.
2011-08-03 23:38:12 +00:00
Jeff Squyres
294e1f50cd Remove compiler warning about nested comment
This commit was SVN r24984.
2011-08-03 18:30:56 +00:00
Jeff Squyres
50ab8d893c Recent (as of 3 Aug 2011) versions of LWP in Macports seem to have
broken SSL certificate verification.  The IU CA is in my Mac system
keychain (and has been there for quite a long time), but after a
recent ports update, LWP fails the SSL certificate verification.
Fine.  So we'll just turn it off, per
http://search.cpan.org/~gaas/libwww-perl-6.02/lib/LWP/UserAgent.pm.

This commit was SVN r24983.
2011-08-03 13:50:23 +00:00
Jeff Squyres
ecc7937584 Format the README a bit and shape up some of the text about MXM.
Still need a bit more, though.

This commit was SVN r24982.
2011-08-03 13:22:56 +00:00
Jeff Squyres
cebd1837e5 Add special token to gkcommit commit messages so that the SVN
pre-commit hook doesn't try to re-close tickets that are referred to
in the original SVN commit messages.

This commit was SVN r24981.
2011-08-03 13:02:45 +00:00
Mike Dubman
7b18ab2fa9 remove unused includes
This commit was SVN r24980.
2011-08-03 07:07:29 +00:00
Jeff Squyres
f539b20a8f Patch from ARM for assembly:
http://www.open-mpi.org/community/lists/devel/2011/08/9586.php

This commit was SVN r24979.
2011-08-02 19:15:24 +00:00
Mike Dubman
45ea375531 code and readme updates, some refactoring
This commit was SVN r24977.
2011-08-02 14:30:11 +00:00
Jeff Squyres
8f4ac54336 Fixes trac:2838: add a warning message and disqualify the TCP BTL if both
btl_tcp_if_include and btl_tcp_if_exclude are specified. 

This commit was SVN r24976.

The following Trac tickets were found above:
  Ticket 2838 --> https://svn.open-mpi.org/trac/ompi/ticket/2838
2011-08-01 23:30:33 +00:00
Wesley Bland
87a96da99c Should fix some of the shutdown woes of the errmgr.
Correctly checks that the orted's job is completed.
Correctly tests to make sure that there is shutdown going on (doesn't rely on orte_orteds_term_ordered).
Adds a patch from Ralph to correctdly check the status of processes.

This commit was SVN r24962.
2011-08-01 14:00:41 +00:00
Ralph Castain
42b125ef35 Move the debug so it more accurately reports
This commit was SVN r24961.
2011-07-29 20:48:46 +00:00
Ralph Castain
70bca4691f Add a new "sensor" module that supports fault tolerance tests - randomly kills local procs and/or the daemon itself
This commit was SVN r24960.
2011-07-29 20:48:22 +00:00
Ralph Castain
e88a6c93da Set properties
This commit was SVN r24959.
2011-07-28 22:03:31 +00:00
Wesley Bland
5fde3e0e00 Move the resilient orte errmgr code into a seperate errmgr for now while it's
still unstable. Reverted errmgr modules back to the original errmgr (with the
updates since the resilient code was brought into the trunk).

This commit was SVN r24958.
2011-07-28 21:24:34 +00:00
Ralph Castain
6c879f87fb Add a new param "orte_remote_tmpdir_base" for those situations where the compute nodes require a different session directory head than the head node.
This commit was SVN r24956.
2011-07-27 19:37:17 +00:00
Ralph Castain
decab98fb2 Do a little better job of catching up on missed mcast messages, and provide a way out of scenarios where catch-up is impossible.
This commit was SVN r24955.
2011-07-27 14:58:30 +00:00
Ralph Castain
c3bc33b3fb Don't be so restrictive - accept "slots" as well as "slot" in rank file
This commit was SVN r24954.
2011-07-27 00:45:30 +00:00
Wesley Bland
b972fd84e1 No longer sends extra FAILED_NOTIFICATION messages in the non-failure case.
Should reduce finalize complexity and avoid a race condition that has been
detected by a few users.

This commit was SVN r24952.
2011-07-26 20:47:44 +00:00
Matthias Jurenz
4ca70e5c91 Changes to OTF:
- improved zlib compression
   - otfprofile-mpi:
      - fixed progress

Changes to VT:
   - fixed C++ linker issue for manual instrumentation of multiple files
   - fixed CUDA kernel launch configuration
   - process and thread buffer size can be explicitly specified by the user via the environment variables VT_BUFFER_SIZE and VT_THREAD_BUFFER_SIZE
   - fixed CUDA buffer management
   - vtfilter:
      - fixed progress
   - vtwrapper:
      - link CUPTI library, if available
   - vtsetup:
      - removed fixed path to *.dtd file in vtsetup-data.xml[.in] (fixes 'java.net.MalformedURLException')

This commit was SVN r24950.
2011-07-26 12:47:05 +00:00
Yevgeny Kliteynik
c1ab24c687 openib: added Mellanox ConnectX3 device ID to the device parameters ini file
This commit was SVN r24947.
2011-07-26 12:06:43 +00:00
Mike Dubman
aefffa073d initial implementation of MXM MTL layer
This commit was SVN r24946.
2011-07-26 04:36:21 +00:00
Ralph Castain
715f871605 Ignore the daemon job when reporting parseable output
This commit was SVN r24944.
2011-07-25 20:44:08 +00:00
Ralph Castain
db193555c2 Use non-blocking sends for recovering from lost multicast messages
This commit was SVN r24943.
2011-07-25 18:49:47 +00:00