1
1
Граф коммитов

11746 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
f799ea225f Orterun creates a "clean" copy of its environment for use in launching procs. This includes properly setting LD_LIBRARY_PATH and PATH, among other things. Unfortunately, our PLM modules were using the local environ instead of the saved copy, thus missing a number of things that really should have been included. From what I see, we got away with the error because the PLM's were duplicating all that setup logic themselves - I'll clean this up over the next few days.
Meantime, correct the PLM's so they use the correct environ for launching.

This commit was SVN r18713.
2008-06-23 22:39:36 +00:00
Ralph Castain
f70b7e51ce Fix a missing header file and ensure we use a portable name for a system limit
This commit was SVN r18712.
2008-06-23 22:32:26 +00:00
Tim Mattox
378b0010c5 Resync the NEWS file with 1.2.7 changes.
This commit was SVN r18711.
2008-06-23 19:30:13 +00:00
Ralph Castain
0fa9d88009 Set $PWD for the application proc to match the cwd. If the user specifies a working dir via -wdir, this ensures that the enviro variable matches what they get from getcwd. Note that any subsequent calls to chdir in the user's program will break that equivalence - we can only ensure it starts out matching!
This commit was SVN r18709.
2008-06-23 18:25:41 +00:00
Ralph Castain
acbcbb81b5 Add some debugging output to the modex set_proc_attr function to see what is being added to the modex
This commit was SVN r18708.
2008-06-23 18:24:08 +00:00
Jeff Squyres
bdaaf01d8a Fixes trac:1338: Have the MCA base specifically check for all requested
components.  If they are not found / able to be opened, a warning will
be printed and the mca_base_component_find() will return
OPAL_ERR_NOT_FOUND.  It is the upper-layer's responsibility to handle
this error appropriately.

This commit was SVN r18707.

The following Trac tickets were found above:
  Ticket 1338 --> https://svn.open-mpi.org/trac/ompi/ticket/1338
2008-06-23 16:14:05 +00:00
Lenny Verkhovsky
937380df2f Memory check after allocation in SM fixed
This commit was SVN r18706.
2008-06-22 14:52:44 +00:00
Jeff Squyres
51d833e8d1 Minor fixes and comment clarifications for MPI-2.1-mandated handling
of strings.  We mostly did the Right Things already; I simplified the
code a bit and also had us not write to more characters in the C
bindings than we're supposed to (per language in the MPI-2.1 spec).

Fixes trac:1238.

This commit was SVN r18705.

The following Trac tickets were found above:
  Ticket 1238 --> https://svn.open-mpi.org/trac/ompi/ticket/1238
2008-06-21 19:33:47 +00:00
Jeff Squyres
24c3aa1d77 Really fix "make dist". Really.
This commit was SVN r18704.
2008-06-21 18:04:38 +00:00
Jeff Squyres
74089a0593 Add a bunch of bullets about improvements in the openib BTL.
This commit was SVN r18703.
2008-06-21 13:35:03 +00:00
Jeff Squyres
281c37afcc Ensure to ignoe the "empty" CPC components.
This commit was SVN r18702.
2008-06-21 11:39:53 +00:00
Jeff Squyres
807e2cc742 Mark a notable place where we need to return an error up to the BTL or
PML.

This commit was SVN r18701.
2008-06-20 22:11:49 +00:00
Jeff Squyres
5ded50df0e * Fix a > that should be ==
* Ensure to destroy the correct QP (local->id[num]->qp will always
   have a valid pointer in it, even if we setup a dummy qp)
 * Note two notable places where we need to figure out how to
   propagate errors up from the CPC to the main BTL / PML when errors
   occur.  Probably have the same issue in IBCM, too.

This commit was SVN r18700.
2008-06-20 22:09:30 +00:00
Jeff Squyres
0074126886 Per #1352, most iWARP adapters today cannot handle connections between
two processes on the same server (!).  So for today, we'll simply mark
all local processes that use iWARP adapters as "unreachable".

More details in #1352.

This commit was SVN r18699.
2008-06-20 22:08:00 +00:00
Jeff Squyres
f4145fce7a Ensure that we don't try to shut down a thread that is not [yet] there
(e.g., if you're excluding some devices, their destructors will be
invoked before the async event thread was setup for them).

This commit was SVN r18698.
2008-06-20 19:30:51 +00:00
Jeff Squyres
ed17b51204 Adjust the max_inline default size down so that it can be accepted on
multiple adapters (eg., Chelsio T3).

But we need to figure out how to determine a good value for the
resident adapter(s) at runtime.  It's problematic because, for
example, Mellanox ConnectX and Chelsio T3 report max_inline values
differently at run-time.  If you ibv_create_qp with a max_inline value
of 0, ConnectX reports back a value that is a formular based on a few
other values (e.g., max_send_sge and max_recv_sge).  But T3 always
reports back "64".

We're looking into this to figure out the best way -- reducing the
default right now should allow other adapters to run while we figure
it out.

This commit was SVN r18697.
2008-06-20 18:24:04 +00:00
Jeff Squyres
930667ac73 Ensure that orte-checkpoint and orte-restart man pages are always
included in the distribution tarball.  This ''appears'' to be an
Automake bug -- I have submitted a bug report to the bug-automake list:

http://lists.gnu.org/archive/html/bug-automake/2008-06/msg00019.html

This commit was SVN r18696.
2008-06-20 18:19:01 +00:00
George Bosilca
54e7e03695 One less warning.
This commit was SVN r18695.
2008-06-20 17:50:19 +00:00
Jeff Squyres
f366f49179 Remove some leftover kruft; the STL is no longer used, so this lock is
no longer necessary.

This commit was SVN r18694.
2008-06-20 13:39:26 +00:00
Jeff Squyres
7a1206d912 Two more minor changes:
* Put the variable in the MPI namespace; keeps it safely segregated
   from user apps
 * Need to actually "extern" the variable to make the compiler not
   complain that the variable is never referenced

This commit was SVN r18693.
2008-06-20 13:36:02 +00:00
Ralph Castain
c693d3a5d1 I hadn't honestly considered before that an MPI process might attempt to call functions in the routed framework intended solely for daemons and HNPs. By design, MPI processes are not allowed to route RML/OOB messages, and hence the routed module in an MPI process has no knowledge whatsoever of how a message will reach its destination (except in the direct module). Thus, it has no way to return a valid routing tree, update a routing tree, or get wireup info.
This commit ensures that attempts to access information that is unknowable or undefined returns appropriate invalid or not_supported values to avoid unexpected behavior and/or segfaults.

This commit was SVN r18692.
2008-06-20 03:26:13 +00:00
Jeff Squyres
7905db57bd Slightly decrease the number of buffers for the NetXen adapter
This commit was SVN r18691.
2008-06-20 01:00:22 +00:00
Jeff Squyres
d6e4ea3803 Two tidbits to make autogen.sh a little more robust:
* Ensure that various aliases and color settings don't muck up some
   common shell commands that we use in autogen.sh
 * Per Ralf W's suggestion, properly []-ize the first argument to
   m4_define()

This commit was SVN r18690.
2008-06-20 00:39:43 +00:00
Ralph Castain
5ebe10ebf1 Fix a bad typo - need to look at the node array as the arch array hasn't been built yet
This commit was SVN r18689.
2008-06-19 21:34:39 +00:00
Ralph Castain
174b9f1482 Ensure this module works in heterogeneous environments.
Note: this module is under development, which is why it is not set as the default. Use at your own risk!

This commit was SVN r18688.
2008-06-19 19:40:47 +00:00
Ralph Castain
ccbf194e8f Visibility fix
This commit was SVN r18687.
2008-06-19 19:08:08 +00:00
Pak Lui
119df10349 Fix the debugging messages
This commit was SVN r18686.
2008-06-19 18:54:20 +00:00
Pak Lui
a924b4a7f4 Define the symbols to allow parallel debuggers to dlopen
the shared object when it is compiled with the Sun Studio C compiler.
Depends on where the extern variables that are included in the headers were 
initialized, there can be instances wheter there is no storage allocated 
for the variables and therefore the symbols may or may not be defined 
when the debugger tries to dlopen this message queue dll.

This commit was SVN r18685.
2008-06-19 18:49:25 +00:00
Ralph Castain
26c9ad5799 Clean-up the DSS API to remove two functions that are supposed to be used solely internally to the DSS. These were likely exposed because we need to call them when packing/unpacking declared types, but this means that developers may accidentally use the wrong functions, causing the DSS buffer to get confused. Instead, return the system to the way it used to work and hide those functions.
This commit was SVN r18684.
2008-06-19 18:46:25 +00:00
George Bosilca
bc9b950162 Honor ^ for the PML selection.
This commit was SVN r18683.
2008-06-19 16:50:46 +00:00
Josh Hursey
b78ae13bf3 add back a missing a header taken away in r18664
This commit was SVN r18682.

The following SVN revision numbers were found above:
  r18664 --> open-mpi/ompi@0532d799d6
2008-06-19 16:08:27 +00:00
Ralph Castain
265b4de5de Ensure that the call to orte_routed is properly protected at compile time when RTE support is disabled
This commit was SVN r18681.
2008-06-19 15:20:06 +00:00
Jeff Squyres
e4172a3c44 Shift the AM "if" logic down from orte/tools/Makefile.am down to the
individual orte/tools/*/Makefile.am files.  This causes "make" to
travese into every directory, even if it's not going to build anything
in that directory (which is a good thing).  It also helps cleanup and
dist issues.

This also affects orte-checkpoint and orte-restart, but I couldn't get
--with-ft to compile properly; I'll pass along a heads-up to Josh to
ensure that I didn't break anything.

This commit was SVN r18680.
2008-06-19 14:46:10 +00:00
Jeff Squyres
a884eebdf1 This warning has been bugging me in MTT nightly runs for forever: make
the char string ident in the C++ library be non-static so that other
places can see it.  This makes the C++ library version string
analogous to all the other version strings.

This commit was SVN r18679.
2008-06-19 14:40:37 +00:00
Ralph Castain
571f483c39 Ensure that we don't breakpoint the debugger until -after- all procs have reported their contact info so we can successfully send the release message
This commit was SVN r18678.
2008-06-19 14:37:46 +00:00
Ralph Castain
3b5e80fa61 Shift responsibility for preconnecting the oob to the orte routed framework, which is the only place that knows what needs to be done. Only the direct module will actually do anything - it uses the same algo as the original preconnect function.
This commit was SVN r18677.
2008-06-19 13:48:26 +00:00
Jeff Squyres
7e45b24001 MPIR_being_debugged is an int, not a bool.
This commit was SVN r18676.
2008-06-19 13:31:34 +00:00
Pavel Shamis
4537827973 Making the qp allocation more optimized.
- sq parameter was replaced with max_inline parameter
- inline is allocated only for relevant QPs

This commit was SVN r18675.
2008-06-19 08:40:39 +00:00
Ralph Castain
b56f8ced4f Ensure params are registered prior to parsing global cmd line options in orterun so that debugger options are properly captured and acted upon.
Ensure that routes to remote procs are set on the HNP before completing launch so that the debugger message can be sent. Solves a race condition that can exist in those environments where the HNP does not have local procs.

This commit was SVN r18674.
2008-06-19 02:58:14 +00:00
Ralph Castain
955d117f5e Add a new grpcomm module that mimics the old 1.2 behavior - it -always- does a modex because it always includes the architecture. Hence, we called it "blind-and-dumb" since it doesn't look to see if this is required - moniker of "bad". :-)
Update the ESS API so we can update the stored arch's should the modex include that info. Update ompi/proc to check/set the arch for remote procs, and add that function call to mpi_init right after the modex is done.

Setup to allow other grpcomm modules to decide whether or not to add the arch to the modex, and to detect if other entries have been made. If not, then the modex can just fall through. Begin setting up some logic in the "basic" module to handle different arch situations.

For now, default to the "bad" module so we will work in all situations, even though we may be sending around more info than we really require.

This fixes ticket #1340

This commit was SVN r18673.
2008-06-18 22:17:53 +00:00
Jeff Squyres
da6aa57efb Change up the logic a bit to handle the Red Storm case a bit better
(still need the real yod environment variable name), and add in some
lengthy comments explaining the two different methods of debugger
attach that we're using.

MPI handle debugging doesn't seem to be working at the moment; still
checking into it...

This commit was SVN r18672.
2008-06-18 21:33:08 +00:00
Pak Lui
188c8bce5d Fix the SEGV when module_get finds that no proc is binded. Also make no-intr available for processor binding.
This commit was SVN r18671.
2008-06-18 16:03:08 +00:00
Ralph Castain
282a220e7e Update the debugger interface per email thread with Jeff and Brian. Handoff to them for final test and validation
This commit was SVN r18670.
2008-06-18 15:28:46 +00:00
Brian Barrett
558e68088c Update Cray XT3 (catamount) platform files to use script wrapper compilers,
properly disable building contrib packages, and always build ROMIO

This commit was SVN r18669.
2008-06-18 14:41:25 +00:00
George Bosilca
1dba362a01 Make the ompi_ddt_dump function externally visible.
This commit was SVN r18668.
2008-06-18 08:37:43 +00:00
George Bosilca
f97a728dc6 Dont cast the int32_t pointer into a long pointer. This doesn't work on
64 bits architectures.

This commit was SVN r18667.
2008-06-18 08:33:58 +00:00
George Bosilca
8e7c35e76c These symbols are only available via the module/component structure, so they
don't have to be globally visible.

This commit was SVN r18666.
2008-06-18 08:20:02 +00:00
George Bosilca
0f9b9c0aff Remove a warning and add arequired header (otherwise we cannot compile when
--disable-debug is specified).

This commit was SVN r18665.
2008-06-18 08:10:02 +00:00
Ralph Castain
0532d799d6 Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm.
Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed.

This commit was SVN r18664.
2008-06-18 03:15:56 +00:00
Jeff Squyres
d0cfca5990 Documentation describing how TV (and others like it) attach to MPI
processes.  Originally downloaded from
http://www-unix.mcs.anl.gov/mpi/mpi-debug/mpich-attach.txt -- cached
here in case that file ever disappears someday.

This commit was SVN r18663.
2008-06-17 21:19:34 +00:00