1
1
Граф коммитов

9039 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
0ce78f22fa Compliments r13351 -- use the new field on the ompi_proc_t struct to
know what my local rank is, and therefore set my paffinity ID as
appropriate.  Specifically, we're no longer relying on the
special/secret mpi_paffinity_processor MCA parameter that the orted
would set for us.

This allows processor affinity to be used in environments where the
orted is not used (e.g., bproc, and someday in the hopefully not
too-distant future, SLURM).

This commit was SVN r13352.

The following SVN revision numbers were found above:
  r13351 --> open-mpi/ompi@a338b7e533
2007-01-29 21:53:04 +00:00
Ralph Castain
a338b7e533 Update the proc system to compute and store the number of local procs and my local relative rank during the callback at STG1. This info will be used to locally compute and set the processor affinity and sched_yield (for oversubscribed conditions).
Over to Jeff now for modifying mpi_init accordingly.

Until Jeff makes his changes, nobody should see anything different as the new info just isn't used by anything!

This commit was SVN r13351.
2007-01-29 20:54:55 +00:00
Ralph Castain
94590366f5 Update the properties
This commit was SVN r13350.
2007-01-29 20:26:11 +00:00
Jeff Squyres
c9f072b84f Strike down a few more stray places that were registering
mpi_leave_pinned and replace them with the one central global
variable.

This commit was SVN r13349.
2007-01-29 20:24:31 +00:00
Brian Barrett
93a2f31932 Use a recursive halving communication algorithm similar to the one used by
MPICH2 for "small" commutative operations in the reduce_scatter basic
implementation.  "small" is currently pretty big, as it doesn't take
much to beat reduce/scatterv.  Need to do much more than this for
better all around performance of MPI_Reduce_scatter, but this was enough
to solve the problems I was having.

This commit was SVN r13348.
2007-01-29 19:29:35 +00:00
Jeff Squyres
61bc9fed3c Duh. Free the temp array when we're done with it. While we're making
this switch, use a clear variable name, too.

This commit was SVN r13347.
2007-01-29 19:04:57 +00:00
Jeff Squyres
e6a0069e95 Fix from an as-yet uncommitted test that Pak will commit shortly.
Found another places that we were incorrectly casting a C++ MPI handle
array to the corresponding C array type and hoping for the best (which
won't work at all).  This commit fixes things so that we now do the
proper conversion between C<-->C++ handles.

This commit was SVN r13346.
2007-01-29 18:44:46 +00:00
Rainer Keller
ca35881cd0 - Minor bugfixes and removed compiler warnings
This commit was SVN r13343.
2007-01-28 19:52:09 +00:00
George Bosilca
38fd8ffb54 The intended patch.
This commit was SVN r13342.
2007-01-28 04:55:05 +00:00
Jeff Squyres
c522f6f997 Tim pointed out that the original value was probably not what was
intended.  In truth, any unique value will do, but he's right that
having a regular, recognizeable, repeatable value is probably much
better for debugging purposes.

This commit was SVN r13341.
2007-01-28 00:46:25 +00:00
Rich Graham
f6c99d0207 set orte_odls_base.components_available to false if no odls components are
available.  Startup now works if no odls components are availble.

This commit was SVN r13339.
2007-01-27 15:37:13 +00:00
Jeff Squyres
3c5c8c3c4c Refinement of Rainer's r13227 and r13228 (worked with Rainer, Ralph,
and George on these refinements):

 * Rename the static OBJ initializer macro to be
   OPAL_OBJ_STATIC_INIT(class)
 * Ensure that all static OBJ initializations get a refcount of 1
   (doesn't ''really'' matter, since they're static, it should never
   get to the point where the OBJ is DESTRUCTed, but more correct
   nonetheless)
 * Add a "magic number" to the OBJ when compiling with debug support.
   The magic number does some rudimentary support to ensure that
   you're operating on a valid OBJ (and fails an assertion if you're
   not).  Check to ensure that the memory contains the magic number
   when performing actions of OBJ's.  Also remove the magic number
   when DESTRUCTing OBJs, so that if, for example, an OBJ is
   DESTRUCTed more than once, we'll fail the magic number assert.

This commit was SVN r13338.

The following SVN revision numbers were found above:
  r13227 --> open-mpi/ompi@96030de97b
  r13228 --> open-mpi/ompi@c2e9075d29
2007-01-27 13:44:03 +00:00
Jelena Pjesivac-Grbovic
33dcb4f810 Minor change to linear alltoall algorithm:
- post isends in reverse order of posting irecvs.
if the messages arrive approximately in order, this should 
minimize the time spent in matching the requests.

I did not see any performance difference over MX up to 64 nodes, but 
the change makes sense and may have some impact when we have (many) 
more nodes.

This commit was SVN r13337.
2007-01-26 21:59:31 +00:00
George Bosilca
e305ad91bd No comment.
This commit was SVN r13326.
2007-01-26 18:24:12 +00:00
Jeff Squyres
974dcebf9f Finish backing out r13316 by also removing the comments that it
insertted.

This commit was SVN r13324.

The following SVN revision numbers were found above:
  r13316 --> open-mpi/ompi@35c1370a13
2007-01-26 13:09:18 +00:00
George Bosilca
668a2bd7ac Remove some debug output.
This commit was SVN r13323.
2007-01-26 08:09:22 +00:00
George Bosilca
bd7eebda83 Deal with the argv problem from r13321 for the Windows PLS.
This commit was SVN r13322.

The following SVN revision numbers were found above:
  r13321 --> open-mpi/ompi@b439e87f96
2007-01-26 07:21:07 +00:00
George Bosilca
b439e87f96 We have this one starting from r12059. We save a pointer to the argv[*] and
then we modify the argv, forcing the reallocation of the array. With luck
the saved pointer still have a meaning ... without execve return with error
14 (EFAULT).

This commit was SVN r13321.

The following SVN revision numbers were found above:
  r12059 --> open-mpi/ompi@ae79894bad
2007-01-26 07:06:52 +00:00
George Bosilca
29597cf0c5 We need to initialize the ODLS as they are the only one to define
the ORTE_DAEMON_CMD type. Which, unfortunately, is used all over
the place. Without this, we get error:
[msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../ompi-trunk/orte/dss/dss_pack.c at line 83
[msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../ompi-trunk/orte/dss/dss_pack.c at line 58
[msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../../../ompi-trunk/orte/mca/pls/base/pls_base_orted_cmds.c at line 136

This commit was SVN r13320.
2007-01-26 04:32:15 +00:00
Ralph Castain
0905dfdfba Make sure the params.h file gets included in the tarballs
This commit was SVN r13318.
2007-01-26 03:05:30 +00:00
Brian Barrett
385a435813 Start long message send as soon as possible, to minimze ack time for the receive,
greatly increasing mid-range bandwidth

This commit was SVN r13317.
2007-01-25 23:07:03 +00:00
Rich Graham
35c1370a13 odls components are handled only by daemon procs.
This commit was SVN r13316.
2007-01-25 21:18:59 +00:00
Rich Graham
3488b394be fix typo in name of the cancel operation.
This commit was SVN r13312.
2007-01-25 19:07:27 +00:00
Rich Graham
1c20feb52b Take into account constants that in the cray headers are defined different than in the portals spec.
This commit was SVN r13311.
2007-01-25 18:32:47 +00:00
Jeff Squyres
580a7a108c Fix a compiler warning.
This commit was SVN r13310.
2007-01-25 17:22:01 +00:00
Tim Prins
e199bf9b64 Refs trac:801
Fix compiler warning

This commit was SVN r13308.

The following Trac tickets were found above:
  Ticket 801 --> https://svn.open-mpi.org/trac/ompi/ticket/801
2007-01-25 16:12:05 +00:00
Brian Barrett
3c11936b34 Minor changes to the distscript:
- The Verification check only checked that a file that's in SVN is there,
    which AM would have complained about during make dist, so it's really
    a pointless check
  - No need to remove / restore autogen.sh, as AM isn't going to put it
    in the tarball anyway, and even if it would, this thing would only
    cause it to fail during make dist.  All this step did was erase any
    changes you had to autogen.sh when you run make_dist_tarball, which
    really sucks.

This commit was SVN r13307.
2007-01-25 15:53:08 +00:00
Ralph Castain
ab5ea61100 Bring over the rest of the ctrl-c fixes. This commit includes:
1. add a "cancel_operation" API to the pls components that allows orterun to demand that an orted operation (e.g., terminate_job) be immediately cancelled and abandoned.

2. changes the pls orted commands from blocking to non-blocking. This allows us to interrupt those operations should an orted be non-responsive. The change also adds an orte_abort_timeout that limits how long orterun will automatically wait for the orteds to respond - if the terminate command, for example, doesn't see orted response within that time, then we printout an appropriate error message and just give up.

3. modifies orterun to allow multiple ctrl-c's to simply abort the program even if the orteds have not responded

4. does some cleanup on the orte-level mca params so that their implementation looks a lot more like that of ompi - makes it easier to maintain. This change also includes the definition of an orte_abort_timeout struct and associated MCA param (can't have too many!) so you can set the time after which orterun gives up on waiting for orteds to respond

This needs more testing before migrating to 1.2.

This commit was SVN r13304.
2007-01-25 14:17:44 +00:00
Jeff Squyres
7b6ed64c7b Add in the hostname to the BTL_* output macros so that you can tell on
which node an event occurred.

This commit was SVN r13302.
2007-01-25 14:02:54 +00:00
Ralph Castain
53967bd698 Fix a memory corruption problem deep inside the registry when subscriptions/triggers are processed. The create_value function will malloc space for the pointers to keyval objects, but doesn't actually allocate space for the objects themselves. When constructing the gpr_notify_data object, we forgot to OBJ_NEW the keyval objects. Since the create_value function didn't explicitly NULL those memory locations, it just so happened that there was a non-NULL address in them....which we dutifully dumped a keyval into.
This fix includes two parts: (a) we now initialize the keyval pointer locations to NULL after the malloc, and (b) we now OBJ_NEW the keyvals prior to storing info in them.

BTW, in case anyone reads this and wonders why we don't just OBJ_NEW the keyvals in create_value, the reason is simply that some places in the code use static keyvals and simply assign those addresses into the value object's array. So not everyone wants to OBJ_NEW keyvals - by not forcing it here in create_value, we give the user the flexibility to do whatever they want.

This commit was SVN r13300.
2007-01-25 12:54:02 +00:00
George Bosilca
f9a3bbfd7a Don't miss the ODLS component from the output.
This commit was SVN r13299.
2007-01-25 08:37:36 +00:00
George Bosilca
5711583bdf Force only one thread to come out from the
socket engine.

This commit was SVN r13298.
2007-01-25 07:36:42 +00:00
George Bosilca
dcce444ed4 When the user give a prefix that really means something. I expect to
start looking for the daemons using the prefix.

This commit was SVN r13297.
2007-01-25 07:35:25 +00:00
Jeff Squyres
b9120aed6d Always make sure to create $(includedir)/openmpi, even if we were
configured with --disable-mpi-cxx so that the default -I flags in the
wrapper compilers don't point to a directory that doesn't exist.
Thanks to Martin Audet for identifying the problem.

This commit was SVN r13296.
2007-01-25 01:59:22 +00:00
George Bosilca
950b07d860 Work around the Windows sockets model.
This commit was SVN r13294.
2007-01-25 00:19:02 +00:00
George Bosilca
3b988fcdfd Small update the the process PLS.
This commit was SVN r13293.
2007-01-25 00:17:54 +00:00
George Bosilca
b6307807d8 Export opal_path_is_absolute which correctly compute if a path is
absolute or not.

This commit was SVN r13292.
2007-01-25 00:17:02 +00:00
George Bosilca
a5b13f824d A more Windows friendly version of the opal_basename function.
This commit was SVN r13291.
2007-01-25 00:15:56 +00:00
Jeff Squyres
6fea000e5f Oops -- get the right function name (copy-n-paste error).
This commit was SVN r13290.
2007-01-24 22:31:13 +00:00
Jeff Squyres
6b69ea664d Make a much, much better error message for a not-uncommon failure
scenario (user/sysadmin forgot to set the memlock limits high
enough).

This commit was SVN r13289.
2007-01-24 22:25:40 +00:00
Patrick Geoffray
b252cb82c8 oops, ".", not "->", copy error...
This commit was SVN r13287.
2007-01-24 19:16:46 +00:00
Patrick Geoffray
d58f6b2451 Free memory in synchronous send case if free_after requires it.
Fixes memory leak using synchronous sends and custom data types.

This commit was SVN r13286.
2007-01-24 19:10:38 +00:00
Jeff Squyres
2edb76f902 Sync against 1.1 NEWS.
This commit was SVN r13281.
2007-01-24 17:58:52 +00:00
Jeff Squyres
5d80458563 Fix typo.
This commit was SVN r13277.
2007-01-24 17:40:02 +00:00
Brian Barrett
4e990ac0e9 Work around structure size / packing issue on 64 bit Intel Mac builds. Long
comment explaining the patch in the patch.

Refs trac:574

This commit was SVN r13276.

The following Trac tickets were found above:
  Ticket 574 --> https://svn.open-mpi.org/trac/ompi/ticket/574
2007-01-24 17:31:49 +00:00
Brian Barrett
6919ccfd2b Don't pass up slave devices in a channel bonding situation as valid
network interfaces, as they shouldn't directly be used

This commit was SVN r13272.
2007-01-24 16:39:16 +00:00
Tim Prins
4fd81b3407 Fixes trac:801
- Make it so the SLURM ras can handle different nodelist configurations
- Some code cleanup and better/more informative error messages and error handling

This commit was SVN r13271.

The following Trac tickets were found above:
  Ticket 801 --> https://svn.open-mpi.org/trac/ompi/ticket/801
2007-01-24 14:45:42 +00:00
George Bosilca
658f411ba4 As there is no X_OK on Windows, make it equal to R_OK.
This commit was SVN r13270.
2007-01-24 05:36:02 +00:00
George Bosilca
6c8bbb423e Replace TAB by space.
This commit was SVN r13269.
2007-01-24 00:53:12 +00:00
George Bosilca
9b16827049 Add ORTE_DECLSPEC, and few conversions.
This commit was SVN r13268.
2007-01-24 00:52:08 +00:00