1
1
Граф коммитов

11658 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Ralph Castain
83dd3d8c6f Restore the ability to forcibly terminate by providing multiple ctrl-c's
This commit was SVN r18618.
2008-06-09 13:08:54 +00:00
Ralph Castain
e1e224b81a Silence a couple of minor compiler warnings
This commit was SVN r18617.
2008-06-09 12:57:41 +00:00
Jeff Squyres
c087b4cd4f * Revert r18067
* Add specific comments about why we're not setting MPI_ERROR here

This commit was SVN r18616.

The following SVN revision numbers were found above:
  r18067 --> open-mpi/ompi@58e31d767e
2008-06-07 02:44:10 +00:00
Pak Lui
caac0e0182 Add in a couple missing ones from r18611 for all tm users out there...
This commit was SVN r18615.

The following SVN revision numbers were found above:
  r18611 --> open-mpi/ompi@7bee71aa59
2008-06-06 22:53:43 +00:00
Ralph Castain
b65eb54ea2 Cut out a new iof pull - that capability isn't ready yet for the trunk, but will be coming shortly
Thanks to Pak for letting me know...

This commit was SVN r18614.
2008-06-06 21:24:15 +00:00
Pak Lui
7f7777a538 Check for NULL in prefix_dir.
This commit fixes trac:1337.

This commit was SVN r18612.

The following Trac tickets were found above:
  Ticket 1337 --> https://svn.open-mpi.org/trac/ompi/ticket/1337
2008-06-06 19:55:01 +00:00
Ralph Castain
7bee71aa59 Fix a potential, albeit perhaps esoteric, race condition that can occur for fast HNP's, slow orteds, and fast apps. Under those conditions, it is possible for the orted to be caught in its original send of contact info back to the HNP, and thus for the progress stack never to recover back to a high level. In those circumstances, the orted can "hang" when trying to exit.
Add a new function to opal_progress that tells us our recursion depth to support that solution.

Yes, I know this sounds picky, but good ol' Jeff managed to make it happen by driving his cluster near to death...

Also ensure that we declare "failed" for the daemon job when daemons fail instead of the application job. This is important so that orte knows that it cannot use xcast to tell daemons to "exit", nor should it expect all daemons to respond. Otherwise, it is possible to hang.

After lots of testing, decide to default (again) to slurm detecting failed orteds. This proved necessary to avoid rather annoying hangs that were difficult to recover from. There are conditions where slurm will fail to launch all daemons (slurm folks are working on it), and yet again, good ol' Jeff managed to find both of them.

Thanks you Jeff! :-/

This commit was SVN r18611.
2008-06-06 19:36:27 +00:00
George Bosilca
2aec094d56 The PML V is a component so it should use OMPI_MODULE_DECLSPEC.
This commit was SVN r18610.
2008-06-06 17:43:57 +00:00
George Bosilca
b2aa751c28 Remove a race condition in the threaded mode. As a callback is allowed
to modify the callback array (add or remove), make sure we don't call
the same callback twice if it get remove in another thread.

This commit was SVN r18608.
2008-06-06 15:54:40 +00:00
George Bosilca
ae7bca2f4a Update the MPI_ERROR field as well.
This commit was SVN r18607.
2008-06-06 15:53:17 +00:00
Josh Hursey
1de50b523c Fix some Coverity 'Event set_but_not_used' highlights.
Thanks to Jeff for bringing them to my attention.

This commit was SVN r18606.
2008-06-06 14:38:41 +00:00
Terry Dontje
e8c8d0c03b This commit fixes trac:1336.
This commit was SVN r18605.

The following Trac tickets were found above:
  Ticket 1336 --> https://svn.open-mpi.org/trac/ompi/ticket/1336
2008-06-06 12:56:45 +00:00
Jeff Squyres
1f226b5898 Adjust the comment to be correct, per
http://www.open-mpi.org/community/lists/devel/2008/06/4095.php.

This commit was SVN r18604.
2008-06-06 01:23:58 +00:00
Jeff Squyres
12a3fe57e1 As pointed out by Ralf
W. (http://www.open-mpi.org/community/lists/devel/2008/06/4095.php),
these dependencies don't need to be here.

This commit was SVN r18603.
2008-06-06 01:20:47 +00:00
Jeff Squyres
b123629e6a Fix CIDs 458, 716, 717: ensure that strings are long enough to always
be properly \0 terminated.

This commit was SVN r18602.
2008-06-06 00:59:08 +00:00
Jeff Squyres
85834b22e6 Change the default to not enable heterogeneous builds; we detect at run-time if a heterogeneous job was started and will barf appropriately if OMPI was not compiled with heterogeneous support
This commit was SVN r18601.
2008-06-06 00:00:37 +00:00
Jeff Squyres
e2b08aaca4 Fix bad free's found in CID 707 and CID 708.
This commit was SVN r18600.
2008-06-05 20:49:33 +00:00
Jeff Squyres
1a748bc7be First cut at the NetEffect NE020 NIC.
This commit was SVN r18599.
2008-06-05 20:24:24 +00:00
Jeff Squyres
d3795d7a34 Fix CID 987: remove unused variable.
This commit was SVN r18598.
2008-06-05 20:17:02 +00:00
Jeff Squyres
9109f7126a Per CID 988, free some memory that would be leaked in an error condition.
This commit was SVN r18597.
2008-06-05 20:04:38 +00:00
Jeff Squyres
f0d465c30a Slightly simplify the code and remove a compiler warning.
This commit was SVN r18596.
2008-06-05 19:08:08 +00:00
Jeff Squyres
b1999bbba3 * Use inclusive NIC/HCA language
* Add a description of receive_queues

This commit was SVN r18595.
2008-06-05 19:07:22 +00:00
Tim Mattox
14cc458784 Resync the NEWS file with changes for 1.2.7
This commit was SVN r18594.
2008-06-05 18:50:24 +00:00
Pavel Shamis
7b9024bc05 Updating Mellanox's Copyright in files touched in 2008
This commit was SVN r18592.
2008-06-05 13:40:26 +00:00
Ralph Castain
6ddcce4085 Apply a patch from Edgar to fix the Intercomm MTT tests.
Fixes ticket #1332

This commit was SVN r18591.
2008-06-05 12:53:12 +00:00
Pavel Shamis
379e00050c Fixing openib btl finalize flow. Bug fix for #1286.
This commit was SVN r18590.
2008-06-05 12:20:13 +00:00
Lenny Verkhovsky
a8b5dcb204 Added more output info about socket:core pair in paffinity / rankfile components
This commit was SVN r18589.
2008-06-05 10:28:44 +00:00
Ralph Castain
332e6c89ab Modify the slurm launcher so that the kill-on-bad-exit behavior is not "on" by default. Instead, only turn it "on" if the plm_slurm_detect_failure mca param is set to something non-zero
This commit was SVN r18588.
2008-06-04 23:59:53 +00:00
Ralph Castain
0da811ce79 Initial work on xml support - allocation and job map outputs completed. More to come.
This commit was SVN r18587.
2008-06-04 20:53:12 +00:00
Ralph Castain
ca91ec525b Add a suffix to the opal_output stream descriptor object - we can now output both a prefix and a suffix for a given stream. Default the suffix to NULL.
Remove lingering references to a filtering system as this will no longer be implemented.

This commit was SVN r18586.
2008-06-04 20:52:20 +00:00
Jeff Squyres
91a281080a Fix a compiler warning for a case that would never really happen
anyway.  Rename a variable to be a bit more descriptive.

This commit was SVN r18585.
2008-06-04 19:10:23 +00:00
Jeff Squyres
bc584dedd6 Remove a compiler warning that would never happen in practice.
This commit was SVN r18584.
2008-06-04 19:03:02 +00:00
Josh Hursey
78f14b5255 Fix the none.checkpoint command.
orte-checkpoint/orte-restart seem to not seem to totally like orte_output so revert them to opal_output for now. Since we have no need for the additional complexity of orte_output we can drop it for now and revisit this if anyone needs it later.

It seems that if you set the verbose level on an output handle then try to call a normal orte_output() on it then the message will *not* be printed. This is the same for opal_output, and seems incorrect to me because it stops some error messages from being printed out if you do not directly specify opal_output(0, ...). Maybe someone should take a look a this.


orte-checkpoint would segv if passed an incorrect PID. Fixed the return code so it errors out properly.

Thanks to Eric Roman for bringing this to my attention.

This commit was SVN r18583.
2008-06-04 14:44:11 +00:00
Jeff Squyres
6e37dd0ef0 Fix some 32/64 printf errors once and for all
This commit was SVN r18582.
2008-06-04 14:39:37 +00:00
Pavel Shamis
0a8321e08d Calls to APM functions should be protected with OMPI_HAVE_THREADS.
This commit was SVN r18581.
2008-06-04 14:27:41 +00:00
Jeff Squyres
5e918ad25d Add first cut of NetXen iWARP NIC definition. May still be refined
with more experimentation.

This commit was SVN r18580.
2008-06-04 12:11:45 +00:00
Jeff Squyres
530a15baa4 Fix cross-compiling scenario with valgrind.m4.
This commit was SVN r18579.
2008-06-04 11:58:41 +00:00
Matthias Jurenz
7f5730d073 Bugfix: Removed *unused* contructors of structure 'FirstHandlerArgument' (Ticket #1318)
This commit was SVN r18578.
2008-06-04 11:53:17 +00:00
Matthias Jurenz
f9b2fa95aa Added some words to Open MPI in the section "Introduction"
This commit was SVN r18577.
2008-06-04 11:52:57 +00:00
Shiqing Fan
2dc812f720 Clean configure.m4 of memchecker/valgrind.
If Valgrind is requested but wrong version is supplied, print error messages and stop. 
Save the CPPFLAGS in opal_memchecker_valgrind_CPPFLAGS, which could be used in 
Makefile.am.

Many thanks to Jeff. 

This commit was SVN r18573.
2008-06-04 11:46:50 +00:00
Ralph Castain
9927b2445c Remove the filter framework - the xml support will have to be provided in a different manner that will be implemented shortly
This commit was SVN r18572.
2008-06-04 09:04:51 +00:00
Pavel Shamis
c73ed2b256 Updating cpc name from xrc to xoob.
This commit was SVN r18571.
2008-06-04 08:50:30 +00:00
Jeff Squyres
75a97ebbf0 Many thanks to Ralf W. for finding a subtle bug in these Makefile.am's
that can *sometimes* cause problems with "make -j [N>1] install".
Ensure to make the target directory before we copy stuff into it --
read the thread starting here for more details:

    http://www.open-mpi.org/community/lists/devel/2008/06/4080.php

This commit was SVN r18570.
2008-06-04 01:28:03 +00:00
Ralph Castain
8ce4b64b5a Ensure we don't go past the end of the array
This commit was SVN r18569.
2008-06-03 21:31:02 +00:00
George Bosilca
25ae9c12e6 Silence few warnings.
This commit was SVN r18568.
2008-06-03 19:58:40 +00:00
George Bosilca
fa89d299bf Silence the Obj-C compiler.
This commit was SVN r18567.
2008-06-03 19:24:17 +00:00
Tim Mattox
4d548485e2 Another NEWS file resync with v1.2.7 changes.
This commit was SVN r18566.
2008-06-03 19:02:57 +00:00
Tim Mattox
97fe7311bf Resync the trunk's NEWS file with the 1.2 NEWS file.
This commit was SVN r18564.
2008-06-03 18:47:10 +00:00
Jeff Squyres
a1b0798413 Minor fixups; added bullet about openib BTL checking for
/sys/class/infiniband

This commit was SVN r18561.
2008-06-03 18:22:51 +00:00