1
1
Граф коммитов

9530 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
c774f641fb Modify orterun to provide more user-friendly reporting on jobs that fail to start
This commit was SVN r14496.
2007-04-24 19:19:14 +00:00
Ralph Castain
19767802de Let the errmgr know how to deal with incomplete starts
This commit was SVN r14495.
2007-04-24 19:04:29 +00:00
Ralph Castain
ef71055cf8 Teach the odls to properly test for and report failed-to-start for application processes.
Test for system limits (where known) prior to doing things like fork and pipe since some systems aren't very nice about it when we try to exceed such limits.

This commit was SVN r14494.
2007-04-24 18:54:45 +00:00
Donald Kerr
cae24fcde1 move mca parameter registration into own .c and .h files
This commit was SVN r14493.
2007-04-24 18:34:16 +00:00
Josh Hursey
8c2385416f Per a developer request -
Make sure that the wrapper selection is compiled out if not enabling FT. Before the 
logic would skip over it since the conditional if statements would not be satisfied, 
now there are no additional if statements when compiled out.

With this modification the selection logic looks nearly identical to pre-r14051
with the exception of the non-FT related improvements.

This commit was SVN r14491.

The following SVN revision numbers were found above:
  r14051 --> open-mpi/ompi@dadca7da88
2007-04-24 17:08:48 +00:00
Ralph Castain
f5ef3d795e Tell the smr how to handle failed-to-start
This commit was SVN r14488.
2007-04-24 16:23:26 +00:00
Jeff Squyres
d8cc501384 Add missing header.
This commit was SVN r14485.
2007-04-24 14:27:51 +00:00
Jeff Squyres
0674bbd001 Fix segv when the shell is not recognized. Thanks to Mostyn Lewis for
noticing the problem.

This commit was SVN r14483.
2007-04-24 12:00:54 +00:00
Ralph Castain
2d04298002 Update the orted cmd xmit functions to match orted recv's. This fixes trac:1004.
This commit was SVN r14482.

The following Trac tickets were found above:
  Ticket 1004 --> https://svn.open-mpi.org/trac/ompi/ticket/1004
2007-04-24 01:58:40 +00:00
Jeff Squyres
50e0745c9e Update copyright.
This commit was SVN r14480.
2007-04-24 00:18:38 +00:00
Josh Hursey
260e7612ad Fix a few interface changes introduced by r14475
This commit was SVN r14479.

The following SVN revision numbers were found above:
  r14475 --> open-mpi/ompi@18b2dca51c
2007-04-23 20:18:27 +00:00
Ralph Castain
5f94d6d791 Fix the cnos rml to match revised xcast API
This commit was SVN r14478.
2007-04-23 19:07:44 +00:00
Jeff Squyres
08041a54c5 Add yet another define option for the spec file:
use_default_rpm_opt_flags.  It defaults to a value of 1, meaning that
we'll try to use $RPM_OPT_FLAGS.  But if you're not compiling with the
GNU compilers, you might want to set this value to 0 so that your
compiler doesn't get flags that it doesn't understand (e.g., PGI 7.0
will barf on flags that it doesn't understand).

This commit was SVN r14477.
2007-04-23 19:00:29 +00:00
Ralph Castain
1682a72d34 Add ability to read system limits on number of children, open files, and file size from the local OS - to be used in failed-to-start scenarios
This commit was SVN r14476.
2007-04-23 18:53:47 +00:00
Ralph Castain
18b2dca51c Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly.
There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd *really* have to try).

This also involved a slight change to the oob.xcast API, so propagated that as required.

Note: this has *only* been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-)

Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately.

This commit was SVN r14475.
2007-04-23 18:41:04 +00:00
Ralph Castain
009be1c1b5 Reorganize the orted code for easier maintenance. Add ability to deliver xcast messages to local procs (not used at this point).
This commit was SVN r14474.
2007-04-23 18:28:20 +00:00
Ralph Castain
b260f8ee36 Enable the job_family API
This commit was SVN r14473.
2007-04-23 18:26:33 +00:00
Ralph Castain
7a57b694bb Allow caller to get session directory name without anything else
This commit was SVN r14472.
2007-04-23 18:25:36 +00:00
Ralph Castain
9cd85ef55a Add a few more error constants that will help provide more definitive output to the user
This commit was SVN r14471.
2007-04-23 18:25:03 +00:00
Donald Kerr
3f428af7b8 couple of minor changes to fix #973 and seperated eager rdma fragments into structure only and data only area
This commit was SVN r14470.
2007-04-23 17:41:34 +00:00
Jelena Pjesivac-Grbovic
53cbec7a09 Make coll/tuned dynamic rules more verbose (when promted with --mca coll_base_verbose 1)
This commit was SVN r14469.
2007-04-23 16:34:52 +00:00
Brian Barrett
0a8af62c64 Fix broken build on OS X with static compiles. Everything that uses
anything in OPAL *MUST* call either opal_init() or opal_init_util().

This commit was SVN r14468.
2007-04-23 15:45:39 +00:00
Ralph Castain
477828159e Add a few test functions transferred from ORTE trunk
This commit was SVN r14467.
2007-04-23 14:43:55 +00:00
Ralph Castain
f47e7382e3 Add a new function to wake orterun up - used in failed-to-start scenarios, but can be used anytime a lower level needs to ensure orterun wakes up
This commit was SVN r14466.
2007-04-23 12:49:25 +00:00
Ralph Castain
3d4f1b86d2 Modify the name service to provide necessary support for failed-to-start scenarios. Add a new API to get_vpid_range - this should be used in place of the rmgr API of that name to avoid race conditions (will remove that API in later commit).
This commit was SVN r14465.
2007-04-23 12:48:19 +00:00
Rich Graham
ce35761683 make sure not to go out of bounds. element i+1 of bml_btls
is referenced, which for i-arr_size-1 is beyond the array dimentions.

This commit was SVN r14464.
2007-04-22 21:43:34 +00:00
Sharon Melamed
cf3f41288b Add pkey value MCA parameter. if this param is used,
only ports with the actual pkey value will be initiate.

This commit was SVN r14463.
2007-04-22 10:22:12 +00:00
Sharon Melamed
fc91aa6f31 hanging atoi call to strtol call to be able to
read Hex values in the param list.

This commit was SVN r14462.
2007-04-22 07:45:51 +00:00
Josh Hursey
27a42f48d3 Make sure to call opal_init_util before mca_base_open().
This bug(?) become apparent due to the installdirs commit since these tools
were not finding the proper libraries since the paths were wonkey.

It all looks good now. :)

This commit was SVN r14461.
2007-04-21 22:38:15 +00:00
Josh Hursey
646c2b2171 This commit fixes trac:1002.
Protect the free and strdup values for replacing keyval pairs just as we do 
below in the files for new keyval pairs.

In basic testing this seems to make everything work as it should again.

This commit was SVN r14460.

The following Trac tickets were found above:
  Ticket 1002 --> https://svn.open-mpi.org/trac/ompi/ticket/1002
2007-04-21 21:51:18 +00:00
Adrian Knoth
339dbf6cd5 Cosmetics. Enforcing style guide.
This commit was SVN r14459.
2007-04-21 21:47:25 +00:00
Josh Hursey
4159b72a60 Some minor updates to go along with commit r14457
This commit was SVN r14458.

The following SVN revision numbers were found above:
  r14457 --> open-mpi/ompi@2af38229c1
2007-04-21 21:24:44 +00:00
Josh Hursey
2af38229c1 Re-worked the implementation of the LAM-like coord component.
It's a bit longer, but much more clear in it's implementation I believe.

Fundamentally it is the same, but is much more solid in the implementation.
I created quite a few directed tests that this version of the implementation 
now passes.

This commit was SVN r14457.
2007-04-21 20:35:01 +00:00
George Bosilca
f0dd3e329c Allow the installdir components to be compiled on Windows.
This commit was SVN r14455.
2007-04-21 06:30:30 +00:00
Jeff Squyres
989c4417a1 Fixes trac:982.
Thank God for google-able mailing list archives:

    http://www.mail-archive.com/bug-libtool@gnu.org/msg00899.html

We ran into this exact bug in Libtool that was causing the C++
bindings library to be compiled incorrectly (therefore causing static
initializers to not fire properly when in a shared library, which is
the default installation configuration).  Putting in some libtool
patches to fix the problem -- will be mailing the Libtool crowd
shortly to ask for a better fix...

This commit was SVN r14454.

The following Trac tickets were found above:
  Ticket 982 --> https://svn.open-mpi.org/trac/ompi/ticket/982
2007-04-21 00:56:47 +00:00
Jeff Squyres
401a072888 Revert r14435 -- it breaks compiling on Linux with at least the PGI
compiler suite.  The rule is that ompi_info.h is supposed to be the
''first'' file included so that it can affect system header files if
necessary (and it is sometimes necessary, such as with the PGI
compiler suite).

If this breaks VC on Windows, we'll have to find another fix.  More on
the mailing list...

This commit was SVN r14453.

The following SVN revision numbers were found above:
  r14435 --> open-mpi/ompi@a20b43ace9
2007-04-21 00:51:31 +00:00
Jeff Squyres
b5aeb235be Make NEWS match the 1.2 branch NEWS
This commit was SVN r14452.
2007-04-21 00:34:13 +00:00
Jeff Squyres
5bebd24250 Bring over Brian's installdirs fixes from this afternoon (r14445).
This commit was SVN r14450.

The following SVN revision numbers were found above:
  r14445 --> open-mpi/ompi@13d366b827
2007-04-21 00:16:31 +00:00
Jeff Squyres
0ba47105ed Merge the /tmp/jms-installdirs-trunk branch into the trunk. This
finally brings in functionality that is already on the 1.2 branch, and
was developed and tested in the v1.2ofed branch (and other places).

Short version of new features:

 * Support for ibv_fork_init() 
 * Automatically fill in the openib BTL bandwidth value by 
   querying the HCA port 
 * Installdirs functionality 
 * Fixes to always use -I in the Fortran wrapper compilers (#924) 
 * Gleb's mpool updates 
 * Remove some kruft in btl/openib/configure.m4, therefore 
   fixing the harmless warnings noted in #665 
 * Bunches of updates to the Linux RPM spec file 

I.e., effectively the same thing that r14411 brought to the v1.2
branch.

Also effectively brought in r14432 and r14433 (some fixes on top of
the original r14411 commit to v1.2).  Still need to bring in the moral
equivalent of r14445 after this commit (fixes to installdirs).

This commit was SVN r14449.

The following SVN revision numbers were found above:
  r14411 --> open-mpi/ompi@83b31314ae
  r14432 --> open-mpi/ompi@a48f160595
  r14433 --> open-mpi/ompi@68f346d2bc
  r14445 --> open-mpi/ompi@13d366b827
2007-04-21 00:15:05 +00:00
Josh Hursey
eef364546c Check for NULL before trying to use the variable.
This commit was SVN r14444.
2007-04-20 17:17:11 +00:00
Brian Barrett
146025bd0a It is allowable to pass NULL as the first argument to opal_argv_split (whether
that makes sense or not, it is allowed and it is done).  Remove the compiler
hint that the argument will never be NULL.  Fixes a segfault in oob init
code when opal_argv_split() was called with a NULL first argument.

This commit was SVN r14440.
2007-04-20 15:31:26 +00:00
Jeff Squyres
40f4b60a2a Use #ifdef, not #if
This commit was SVN r14439.
2007-04-20 14:26:20 +00:00
Shiqing Fan
a20b43ace9 The header file ompi_config.h should be included in ompi_info.h file but not in the .cc files. This make it could be compiled with VC compiler, and it works no difference under linux systems.
This commit was SVN r14435.
2007-04-20 09:03:16 +00:00
Josh Hursey
b9da59ebc3 Fix the way we determine which sequence number to restart with.
Create a sentinel value in the metadata file to clearly indicate
that the sequence number is complete (versus in progress). This
way we do not try to restart from an invalid sequence number
which can lead to badness.

This commit was SVN r14423.
2007-04-19 15:04:27 +00:00
Josh Hursey
12e5d0e817 ft_event Commit:
- Move the PML Modex stuff out of the BML -- Abstraction violation.
- Also fix the location of the add_procs with respect to the stage gates.

This commit was SVN r14422.
2007-04-19 03:05:12 +00:00
Josh Hursey
d12ddcdb7a Protect the free since if we never send any messages this could be NULL.
This commit was SVN r14421.
2007-04-19 02:17:50 +00:00
George Bosilca
51fc2474f1 Don't keep the data attached to a fragment segmented when we have
to move it into the unexpected queue. Instead pack the data in
only one buffer. Now the code look more optimized and clear, but
I have a doubt about who's using this functionality. I think that
all BTLs always return only one memory segment attached to the
matching fragment (i.e. there is no unexpected iov type receive).

This commit was SVN r14416.
2007-04-18 15:52:11 +00:00
Sven Stork
037b01ce9e - more symbols that need to be exported
This commit was SVN r14415.
2007-04-18 14:53:56 +00:00
Sven Stork
b5f1538d21 - export the required symbols
This commit was SVN r14414.
2007-04-18 13:27:28 +00:00
Jeff Squyres
1e364218a2 Remove unused variable
This commit was SVN r14413.
2007-04-18 13:10:10 +00:00