1
1
Граф коммитов

6034 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
ef51608a81 fix compiler warning
This commit was SVN r7706.
2005-10-11 22:03:21 +00:00
Edgar Gabriel
0675c22dab updating with Jeff's help to the recent autogen/configure system
This commit was SVN r7705.
2005-10-11 21:50:16 +00:00
Josh Hursey
af9ccdf04a need to use get_first instead of get_begin since we don't want to execute
this loop if "nodes" is an empty list. get_first, in this loop context, 
allows us to do just that, while get_begin doesn't.

This fixes a --host problem that appeared on the Linux PPC64 build.

This commit was SVN r7703.
2005-10-11 21:33:04 +00:00
Edgar Gabriel
7b07dbc163 another round of fixes. Unfortunatly, I also have to provide a trivial
version of reduce and gather to make all this work....

This commit was SVN r7702.
2005-10-11 21:26:07 +00:00
Josh Hursey
8ba2900341 fixed a typo, added comments for future work
This commit was SVN r7700.
2005-10-11 20:59:31 +00:00
Tim Woodall
4a71621410 merge in scheduling changes from release branch
This commit was SVN r7699.
2005-10-11 20:41:51 +00:00
Edgar Gabriel
c8adc2e65e coding around the collective operations
This commit was SVN r7698.
2005-10-11 20:34:17 +00:00
Edgar Gabriel
083d0b9630 Checkpoint: most of the coding should be done for the basic
infrastructure.

This commit was SVN r7696.
2005-10-11 19:45:21 +00:00
Ralph Castain
e1244fc160 Fix a few thread-lock things discovered by Josh. The thread locks in the registry's local notify delivery system had not been updated to reflect the design change whereby the xcast uses the notify delivery system. This has now been fixed.
Also revised the callbacks to store and utilize local variables to avoid problems where threads modify the global structures. Not sure this totally fixes the problem, but it's a shot - suggested by Josh (and Jeff, I believe).

This commit was SVN r7694.
2005-10-11 19:35:04 +00:00
Graham Fagg
607bdf51b6 Last Cleanup BEFORE adding last two methods and final cross over points.
- new mca param calls
- move printfs to OPAL_OUTPUT

This commit was SVN r7692.
2005-10-11 18:51:03 +00:00
Edgar Gabriel
b42d4ac780 Checkpoint:
- update the hierarch stuff to use btl's instead of ptl's
- start the new logic regarding how to handle local leader communicators

This commit was SVN r7691.
2005-10-11 17:29:59 +00:00
Jeff Squyres
9a1db9abba Allow passing of argument to "make" from the nightly test suite so
that you can do parallel (and therefore much faster) builds.

This commit was SVN r7690.
2005-10-11 14:25:12 +00:00
Jeff Squyres
3652d25800 Fix for pointer math -- ensure that we're interpreting by (char *),
not (unsigned long*).  Thanks to Paul Hargrove for catching this.

This commit was SVN r7685.
2005-10-11 12:19:05 +00:00
Brian Barrett
4d0154fe3a * make sure to ship missing header file
This needs to go to the 1.0 branch

This commit was SVN r7684.
2005-10-11 04:20:24 +00:00
Brian Barrett
1df9b160a9 * greatly simply the maffinity:libnuma configure macro - use the
OMPI_CHECK_PACKAGE macro instead of doing everything ourself.
  The old code was causing problems - it wouldn't add anything to
  WRAPPER_EXTRA_{LDFLAGS, LIBS} if libnuma was installed in /usr,
  so it didn't work so well.

  This should go to the 1.0 branch

This commit was SVN r7683.
2005-10-11 01:37:45 +00:00
Jeff Squyres
5c98bbeae6 Update to docs
This commit was SVN r7682.
2005-10-10 19:13:54 +00:00
Ralph Castain
a47655b3fd Add unlock/lock around the delivery of a local callback to remove thread-lock condition if the callback function attempts to re-enter the registry.
This commit was SVN r7678.
2005-10-10 02:45:50 +00:00
Ralph Castain
6c839048cf Fix a typo that caused valgrind to bark on 64-bit machines. Actually was a potential source of error, so the barking was legit.
This commit was SVN r7677.
2005-10-10 02:34:26 +00:00
Galen Shipman
23cbac25c8 lower default free list sizes..
This commit was SVN r7676.
2005-10-09 18:15:12 +00:00
Josh Hursey
d5ebb5c46a fix a compiler warning
This commit was SVN r7674.
2005-10-08 17:03:12 +00:00
Jeff Squyres
982e34c50d Update the dist script a bit
This commit was SVN r7672.
2005-10-08 10:53:06 +00:00
Jeff Squyres
66a8381765 Increase expected tool versions to all the latest versions
This commit was SVN r7670.
2005-10-08 02:55:34 +00:00
Jeff Squyres
becb2b4082 Bump the trunk up to 1.1
This commit was SVN r7669.
2005-10-08 02:39:41 +00:00
Jeff Squyres
670a82e651 Mark this function as deprecated.
This commit was SVN r7665.
2005-10-07 22:27:27 +00:00
Jeff Squyres
0629cdc2d7 Bring back the changes from /tmp/jjhursey-rmaps. Specific merge
command:

svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps .

(where "." is a trunk checkout)

The logs from this branch are much more descriptive than I will put
here (including a *really* long description from last night).  Here's
the short version:

- fixed some broken implementations in ras and rmaps
- "orterun --host ..." now works and has clearly defined semantics
  (this was the impetus for the branch and all these fixes -- LANL had
  a requirement for --host to work for 1.0)
- there is still a little bit of cleanup left to do post-1.0 (we got
  correct functionality for 1.0 -- we did not fix bad implementations
  that still "work")
  - rds/hostfile and ras/hostfile handshaking
  - singleton node segment assignments in stage1
  - remove the default hostfile (no need for it anymore with the
    localhost ras component)
  - clean up pls components to avoid duplicate ras mapping queries
  - [possible] -bynode/-byslot being specific to a single app context 

This commit was SVN r7664.
2005-10-07 22:24:52 +00:00
Galen Shipman
fb19cc4177 compiler warning fixes..
This commit was SVN r7661.
2005-10-07 17:38:34 +00:00
Tim Woodall
3c900a7aa2 - fix a deadlock on threaded build
- update sequence number after a partial write completes

This commit was SVN r7654.
2005-10-06 21:50:58 +00:00
Tim Woodall
a79e07390a remove debug
This commit was SVN r7653.
2005-10-06 21:29:47 +00:00
Tim Woodall
2ea71064ad close all file descriptors w/ the exception of stdin/stdout/stderr
otherwise, parent's file descriptors are inherited and held open by
the child even if the parent dies

This commit was SVN r7652.
2005-10-06 21:22:36 +00:00
Tim Woodall
797922fbab - cleanup on loss of connection to peer
- generate ack if no one to forward msg to

This commit was SVN r7651.
2005-10-06 21:21:26 +00:00
Tim Woodall
3280f6e655 add facility to receive callback on disconnection from peer
This commit was SVN r7650.
2005-10-06 19:39:20 +00:00
Jeff Squyres
b22fab2826 Fix for a bug Galen noticed yesterday -- make the shared memory only
be allocated the first time a sm coll is selected for a communicator,
not before.

This commit was SVN r7647.
2005-10-06 13:17:27 +00:00
George Bosilca
1fe18814da Decrease the default length for the first fragment.
This commit was SVN r7643.
2005-10-06 00:05:01 +00:00
George Bosilca
0f04132b13 mx_connect in the MX documentation is supposed to take a timeout in seconds. However, in real life it seems that the timeout should be in micro-second.
This commit was SVN r7642.
2005-10-06 00:04:27 +00:00
Andrew Friedley
b1af69dfe7 Don't check for errors on the paffinity stuff, as per Brian's request.
This commit was SVN r7640.
2005-10-05 18:08:06 +00:00
Brian Barrett
b7ef094766 * the cid in the header is only 16 bits, so limit our max cid to what can fit in there.
This commit was SVN r7639.
2005-10-05 15:43:28 +00:00
Andrew Friedley
37123ed430 Implement an opal_show_help() (like is done in ompi_mpi_init) for error handling in opal_init and both stages of orte_init.
Some of the functions in opal_init are void or return a bool (opal_output_init, but always returns true.. eh?), so I don't check them.

This commit was SVN r7638.
2005-10-05 13:56:35 +00:00
Jeff Squyres
65f1adfedc Add "-tv" option to orterun:
orterun -tv -np 4 foo

which will turn around and re-exec:

      totalview orterun -a -np 4 foo

This commit was SVN r7636.
2005-10-05 10:24:34 +00:00
Jeff Squyres
65698bc6be Remove compiler warning
This commit was SVN r7635.
2005-10-05 10:23:02 +00:00
Jeff Squyres
3a225cf01a Clarify some documentation comments
This commit was SVN r7631.
2005-10-05 03:02:11 +00:00
Jeff Squyres
0f100d8577 - Don't overwrite rc with the return value from pls_tm_disconnect --
it's always ORTE_SUCCESS and sometimes masks real !=ORTE_SUCCESS rc
  values. 
- Add MCA param pls_tm_want_path_check.  If nonzero (the default),
  check for the orted in the PATH before each tm_spawn()'ing (doing a
  little caching so that we don't hammer on the filesystem -- remember
  all the PATH's where we successfully found the orted so that we
  don't have to query the filesystem multiple times for a PATH where
  we previously found the orted)
- Be sure to opal_argv_split() the pls_tm_orted MCA param

This commit was SVN r7625.
2005-10-04 19:38:51 +00:00
Jeff Squyres
b79c46dbf6 Downgrade the default priority to 75, just to give leeway (same as the
slurm pls).

This commit was SVN r7624.
2005-10-04 19:18:52 +00:00
Brian Barrett
11932e89d5 Fix nasty bug Josh was seeing where cntl-c wasn't working when orterun hangs up
in finalize after the first cntl-c is sent.

If the signal delete function was called while the signal set was blocked (which
is really most of the time in OMPI), the signal handler would be set to default
and the event library sigset modified, but the signal would never be unblocked.
Now we unblock the signal after reseting the handler...

This commit was SVN r7623.
2005-10-04 19:01:13 +00:00
Jeff Squyres
eb24fe4fd8 If the job fails to launch properly, set its state to ABORTED, which
will fire some subscriptions that will eventually result in invoking
terminate_job (i.e., terminate anything that may have been
successfully started by launch).

This commit was SVN r7622.
2005-10-04 17:19:23 +00:00
Jeff Squyres
83b5a675f9 Don't automatically take the first entry off the selected component
list; be sure to check its priority against the basic component and
take the one with the higher priority.

This commit was SVN r7621.
2005-10-04 17:09:45 +00:00
George Bosilca
967cd1be32 Make the datatype compile on solaris.
Remove some warnings ...

This commit was SVN r7619.
2005-10-04 15:45:18 +00:00
Jeff Squyres
80399aff17 Add some README's to describe what these components are fore.
This commit was SVN r7618.
2005-10-04 15:14:23 +00:00
Jeff Squyres
3df0828921 Restore this PLS -- LANL needs this for some of its older clusters.
This commit was SVN r7617.
2005-10-04 15:09:38 +00:00
Jeff Squyres
b17c4334c4 - Remove all vestigates of using the built-in mcb_tree from the
reduce_inorder() function -- we don't use the tree at all.
- Add more relevant "volatile"'s for the control buffers in the
  fragment mpool (and associated casts where necessary)

This commit was SVN r7616.
2005-10-04 14:52:59 +00:00
George Bosilca
9a67831ba3 Alway call the memory allocation function with the correct type for the first argument. The problem is
that on some OSes the iovec struct is not POSIX complian, the iov_len is not a size_t but simply an int.
This patch, add a local variable (type size_t) to use with the memory allocation function, and then put
back the value in the iov_len field.

This commit was SVN r7615.
2005-10-04 14:44:59 +00:00