1
1
Граф коммитов

5948 Коммитов

Автор SHA1 Сообщение Дата
Andrew Friedley
82ee2933a5 - Add an opal_show_help() to the pls fork module to explain what went wrong when the execv to start the application fails.
- Add a couple opal_show_help()'s to indicate when not enough slots/nodes are available to satisfy a request.

This commit was SVN r7555.
2005-09-30 14:30:21 +00:00
Jeff Squyres
fcef1774d5 Per advice from Ralf W., change the pkgdata declarations in
Makefile.am's to be a *slightly* more correct (and, more importantly,
less error-prone) construct.

This commit was SVN r7554.
2005-09-30 13:32:39 +00:00
Jeff Squyres
80b7deb4d7 Add in EXTRA_DIST to get helpfile in tarballs
This commit was SVN r7553.
2005-09-30 10:25:04 +00:00
Brian Barrett
e98a6d32d7 * fix compiler warning about void* -> function pointer casting. Stupid
compilers and return type of dlsym()

This commit was SVN r7552.
2005-09-30 05:15:27 +00:00
Brian Barrett
b808fb82c9 * fix compiler warning about void* -> function pointer casting. Stupid
compilers and return type of munmap....

This commit was SVN r7551.
2005-09-30 04:57:08 +00:00
Brian Barrett
7b20370306 * pretty-print an error message if a btl component loads but can't find
any NICs to use
* Make mvapi, gm, and mx components all publish information, even if there
  are no NICs available so that modex_recv doesn't hang.  If there are no
  NICs available, don't set the reachable bit, but don't do anything
  to fail.  This unfortunately doesn't cover the hangs that will result if
  different procs load different sets of components, but it's a start

This commit was SVN r7550.
2005-09-30 04:39:44 +00:00
Brian Barrett
e0c3775551 * remove some duplicate dependencies that were making Solaris mad
This commit was SVN r7549.
2005-09-30 04:13:26 +00:00
Brian Barrett
a77c908496 * the last of the tuning params for portals
This commit was SVN r7548.
2005-09-30 04:05:31 +00:00
Galen Shipman
8239e635b9 fix misc warnings, cleanup macro..
This commit was SVN r7547.
2005-09-30 03:13:51 +00:00
Galen Shipman
05e6e51fec re-reg from min of bases and max of bounds
add byte counting for total registered memory 

This commit was SVN r7546.
2005-09-29 21:28:54 +00:00
Jeff Squyres
bc181d7130 Remove the .ompi_ignore so that everyone starts compiling this, but
lower the default priority to 0 so that it's not active unless you
specifically ask for it (this component needs more testing by people
other than me before we unleash it on the public).

This commit was SVN r7545.
2005-09-29 18:05:47 +00:00
Jeff Squyres
d4b7618db7 Comment out what seems to be a debugging output. Will confirm with
George.

This commit was SVN r7544.
2005-09-29 16:39:27 +00:00
Josh Hursey
d39841174d Must release the lock before entering the non blocking recv, since
it is possible that if the receive has been arrived the callback will
be called before recv_buffer_nb() returns. This causes deadlock
as we try to acquire the lock, but already hold it.

This was causing orterun and orteds to stall in certian situations.
Became evident when stress testing dynamics with remote nodes.

This commit was SVN r7543.
2005-09-29 14:24:11 +00:00
Brian Barrett
997644af31 * There are now two forms of ibv_create_cq, one with 3 params and one with 5.
Try to detect which form this version of Open IB uses, defaulting to the 5
  version if we can't figure it out (the new version has 5 params)
* Only add -lcm if it exists on the system - some versions of Open IB
  apparently don't need it.

This commit was SVN r7542.
2005-09-29 13:35:57 +00:00
Jeff Squyres
de1c8fb125 - Make debug output a bit more accurate and readable
- Fix bug identified by users: --prefix may also apply on the local
  node; we need to prefix the PATH and LD_LIBRARY_PATH environment
  variables before invoking execve()

This commit was SVN r7541.
2005-09-29 12:35:43 +00:00
Jeff Squyres
fa4f7a6261 Move assignment out of inner loop -- only needs to be done once, and
fixes a compiler warning (and potential bug)

This commit was SVN r7540.
2005-09-29 10:09:20 +00:00
Josh Hursey
c11ba09655 Remove the progress engine stuff from abort. This was causing
some orted's to stall on locks in the MPI Dynamics cases. Since it
is not essentual that we call these functions, they can so away.

Unlock the peer lock when aborting. This causes a potential deadlock
in do_waitall [see comment in code]. This was causing orteds to
deadlock at times when the seed had terminated. With proper interleaving
and timing the orted was deadlocking. This seems to have fixed this in 
my stress testing with MPI 2 Dynamics.

This commit was SVN r7539.
2005-09-29 05:04:43 +00:00
Josh Hursey
e825b4522f Upon further investigation the fix in r7537 was an anomoly of zero'ing out the
bits to expose the low bits being set. We were casting from a size_t to a void*
which is not good when working with big endian machines.

This fix makes MPI 2 dynamics work on PPC 64 (tested with a Linux OS).

This commit was SVN r7538.

The following SVN revision numbers were found above:
  r7537 --> open-mpi/ompi@fd45714c03
2005-09-28 23:50:42 +00:00
Josh Hursey
fd45714c03 For some reason we have to initialize this variable or bad things happen in the
comm->c_coll.coll_bcast of the rnamebuflen.

This fixes the threaded MPI 2 Dynamics stuff. Should be working great now! Yay!

This commit was SVN r7537.
2005-09-28 22:30:41 +00:00
Galen Shipman
3ded88a3c0 use addr +size -1 instead of base->addr as base->addr is down_aligned.
This commit was SVN r7536.
2005-09-28 20:19:33 +00:00
Galen Shipman
26a74d42fa release, not retain on gm_free
This commit was SVN r7535.
2005-09-28 20:18:52 +00:00
Brian Barrett
f36642b73b * fix compile error. i should know better
This commit was SVN r7534.
2005-09-28 18:18:12 +00:00
Brian Barrett
ad62411a15 * fix deadlock in memory code that occurs if OBJ_NEW calls realloc to
expand it's class list array by pre-allocating the callback item before
  we obtain the lock (we free it if we ended up not needing it)

This commit was SVN r7533.
2005-09-28 16:40:19 +00:00
Edgar Gabriel
67dd52efb1 making the allreduce and reduce_scatter tests pass as well
This commit was SVN r7532.
2005-09-28 15:12:05 +00:00
Josh Hursey
75419313f7 check the return code and do something reasonable, instead of progressing and hanging on error
This commit was SVN r7531.
2005-09-28 06:13:51 +00:00
Galen Shipman
c1f5543f62 need to call mpool_release on all registrations obtained in the pml.
sanity checks 

This commit was SVN r7530.
2005-09-28 04:49:40 +00:00
Josh Hursey
4cf4b4ea86 Fix for MPI 2 dynamics.
The NS replica should give out tags that are over ORTE_RML_TAG_DYNAMIC
or it will overlap with other outstanding tags. This overlap was killing
MPI_Comm_spawn when a program tried to use it multiple times (> 3).

With this fix MPI_Comm_spawn is behaving properly.
A program can call it many times in a row with out problem.

NOTE: Not tested for multi-threaded build yet

(A long time debugging for a one liner... :/)

This commit was SVN r7529.
2005-09-28 03:20:43 +00:00
Galen Shipman
b9b78f8f5d modify rcache_rb to find registrations in the middle of a base and bound
This commit was SVN r7528.
2005-09-28 02:11:35 +00:00
Brian Barrett
676e34c2d4 * play the linker games that need to be played to shut Darwin up
This commit was SVN r7527.
2005-09-28 01:24:05 +00:00
Edgar Gabriel
dbbbd416df fixing MPI_IN_PLACE for the log-reduce algorithm.
This commit was SVN r7526.
2005-09-27 21:51:55 +00:00
Galen Shipman
0fc17cedee change order of ops on register
This commit was SVN r7525.
2005-09-27 21:43:41 +00:00
Jeff Squyres
285ded5655 - Ensure to have !initialized || finalized test *first*
- If we have an NS error, don't return an error -- this function's
  purpose is to abort :-)
- s/abort()/exit(1)/ so that we don't drop massive corefiles

This commit was SVN r7524.
2005-09-27 20:26:38 +00:00
Jeff Squyres
88ab3dcdf0 Fix a bunch of stupid typos
This commit was SVN r7523.
2005-09-27 19:31:20 +00:00
Jeff Squyres
26efa2c4cd Add a default prefix to stream 0 ("[hostname:pid]") to help identify
where output is coming from.  Threw in a few minor style fixes while I
was editing these files.

This commit was SVN r7522.
2005-09-27 19:27:26 +00:00
Galen Shipman
09e67ce4fd fix off by one on up_align_addr, use base and bound instead of base_align and
bound_align.. 

This commit was SVN r7521.
2005-09-27 18:10:44 +00:00
Jeff Squyres
621fb2b99e - Add more missing fortran values
- Add warnings to not tamper with constant values in mpi.h.in without
  also adjusting mpif.h.in

This commit was SVN r7520.
2005-09-27 17:47:21 +00:00
Brian Barrett
9d2c2d4ab8 * add declarations for MPI_SEEK_CUR and MPI_SEEK_END
This commit was SVN r7519.
2005-09-27 17:20:41 +00:00
Jeff Squyres
31b2ec198b Fix problem identified by Ferris McCormick -- we were missing some
out-of-bounds protection for output ID's in some functions.  Also,
move some logic for closing the syslog inside a conditional block --
it's only necessary to close the syslog if we actually closed a
stream.

This commit was SVN r7518.
2005-09-27 16:43:48 +00:00
Brian Barrett
7c924dc221 * don't try to fire up orte - nothing good comes from trying to open all those
components...

This commit was SVN r7517.
2005-09-27 14:48:41 +00:00
Galen Shipman
51f1c7a8e4 bring ompi_fifo up to new mpool interface,, looks like this has been stale for
some time. Comment out an incorrect test in ompi_rb_tree.c 

This commit was SVN r7516.
2005-09-27 14:36:53 +00:00
Galen Shipman
af04b3e1ab fix warnings..
This commit was SVN r7515.
2005-09-27 14:23:51 +00:00
Brian Barrett
70e59774e9 * add some missing ignores
This commit was SVN r7514.
2005-09-27 13:43:38 +00:00
Josh Hursey
a23370c007 Converted some MCA parameters from the old version to the new.
Have the ras_base_schedule_policy MCA parameter working once again. before it 
would only do slot based allocation, even if the MCA parameter was set properly.

Currently you can specify to orterun a node allocation by either:
-mca ras_base_schedule_policy node
-bynode

and slot allocation (which is the default) by:
-mca ras_base_schedule_policy slot
-byslot

This commit was SVN r7513.
2005-09-27 02:54:15 +00:00
Jeff Squyres
844cf6ae39 Add missing declarations for MPI_SEEK_SET, MPI_MODE_CREATE,
MPI_MODE_WRONLY

This commit was SVN r7512.
2005-09-27 02:37:11 +00:00
Brian Barrett
80ac5c2efd * there are now two upcoming points where we want to release a version with
a random string of characters as part of the version number (the really
  soon to happen 1.0lanl release and the 1.1sc2005 release that we've
  talked about).  So rather than having alpha and beta fields that must
  be numeric values, have a general field that can be any alphanumeric
  value.

This commit was SVN r7511.
2005-09-27 02:06:05 +00:00
Galen Shipman
3c97b3f722 Modified the registration to include a base_align and bound_align for
searching the tree. Modified the memory callback to search the tree at each
page boundary for registrations. This is necessary as an application may
malloc memory and send out of any portion of that memory, even discontiguous
regions. 

This commit was SVN r7510.
2005-09-27 02:01:21 +00:00
Jeff Squyres
5ffd95e971 Commit of the unified component selection system. This was mostly
in-place already (turns out that I was wrong in thinking that it
didn't work for static components), but the logic for excluding
components was not there.  This commit does a few things:

- Adds "exclude" logic, so that you can do:

  mpirun --mca btl ^mvapi,openib ...

  (note the "^" character -- I tried "!" but then you have to escape
  it in the shell, and that was icky) which will exclude both the
  mvapi and openib btl components (excluding one component means that
  you are excluding all components in the list; it doesn't make sense
  to include some and exclude others -- you're entire entirely
  including or entirely excluding)

- Simplifies the "include" logic, so the same old stuff like this
  still works:

  mpirun --mca btl tcp,self ...

  will only use the tcp and self btl components.

- Added more verbosity statements to make this selection process
  clear.

This commit was SVN r7509.
2005-09-26 21:55:32 +00:00
Brian Barrett
1d9b663b62 * test for condition where we think we can intercept malloc/free/munmap but
really can't.  Test for munmap, since it's the most likely to cause problems,
  since it's always an interposed symbol.

  The condition that usually causes problems is if libmpi was brought in as
  the result of a library dependency, rather than as a -l on the link line.
  The linker in this case will find malloc/free/munmap/etc. in libc, rather
  than in libmpi.

This commit was SVN r7508.
2005-09-26 20:20:20 +00:00
Brian Barrett
d9e80d8f2a * increase size of event queue for receives - it was too small to be useful
on a reasonably sized machine
* if no mpool exists, don't try to malloc out an array of 0 bytes

This commit was SVN r7507.
2005-09-25 17:04:03 +00:00
Galen Shipman
384c472c94 reset ompi_pointer_array in mca_rcache_rb_find otherwise you might use an old
registration by accident.. 

This commit was SVN r7506.
2005-09-24 20:48:14 +00:00