1
1
Граф коммитов

16407 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
6cbd8fa6c9 Keep everyone in sync with new job state
This commit was SVN r25563.
2011-12-02 14:12:40 +00:00
Ralph Castain
07655e2945 Handle the case where the allocator "fibs" to us about the node names. In some cases (ahem...you know who you are!), the allocator will tell us a node number (e.g., "16"). However, the daemon will return a node name (e.g., "nid0016") - leaving us not recognizing its location.
So provide a new parameter (can't have too many!) that handles this situation by stripping the prefix from the returned node name. Also do a little cleanup to ensure we cleanly exit from errors, without generating too many annoying messages.

This commit was SVN r25562.
2011-12-02 14:10:08 +00:00
Rolf vandeVaart
bdc7f7a4ef Add check for version of CUDA. Not used yet, but will be in future.
This commit was SVN r25561.
2011-12-02 13:35:20 +00:00
Jeff Squyres
ecf6ba910c Silence a few icc warnings and about mixing enums with other types.
This commit was SVN r25560.
2011-12-02 13:18:54 +00:00
Ralph Castain
357ac14530 Can't return a numerical value here
This commit was SVN r25559.
2011-12-02 10:36:57 +00:00
Ralph Castain
9af80be432 Add missing platform files to tarball
This commit was SVN r25558.
2011-12-01 18:05:16 +00:00
Ralph Castain
641e17f26c A better way of handling fqdn allocations. Prior method was wrong as it equated "node1" with "node10", which definitely caused problems.
Detect the addition of fqdn nodes in the allocation. If not found, then strip all incoming hostnames from daemons of any domain info when matching those names against the names in the node pool.

Leave some protection and "live" diagnostic output in place so we can continue to detect problems across all environments.

This commit was SVN r25557.
2011-12-01 14:24:43 +00:00
Ralph Castain
512aea79bc Print the right nodename value, fix the strange case
This commit was SVN r25556.
2011-12-01 02:31:56 +00:00
Ralph Castain
44394c6b34 Add a little more protection
This commit was SVN r25555.
2011-12-01 00:30:56 +00:00
Ralph Castain
c4ea7a252a Add a little protection against badly formed node names so we don't segfault if they are encountered
This commit was SVN r25554.
2011-11-30 23:33:59 +00:00
Nathan Hjelm
bb1fec0407 added put/get btl descriptor flags
This commit was SVN r25553.
2011-11-30 21:37:23 +00:00
Ralph Castain
fa9e99454a Don't divide by cpus-per-task - we'll deal with that at binding time.
This commit was SVN r25552.
2011-11-30 21:35:25 +00:00
Ralph Castain
c56acf60ca Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order.
Turns out, this isn't necessarily true. The Cray, for example, launches processes in a toroidal pattern, thus causing the daemons to wind up somewhere other than what we thought. Other environments (e.g., slurm) are also capable of such behavior, depending upon the default mapping algorithm they are told to use.

Resolve this problem by making the daemon-to-node assignment in the affected environments when the daemon calls back and tells us what node it is on. Order the nodes in the mapping list so they are in daemon-vpid order as opposed to the order in which they show in the allocation. For environments that don't exhibit this mapping behavior (e.g., rsh), this won't have any impact.

Also, clean up the vm launch procedure a little bit so it more closely aligns with the state machine implementation that is coming, and remove some lingering "slave" code.

This commit was SVN r25551.
2011-11-30 19:58:24 +00:00
George Bosilca
4b7e3b0af8 Correctly generate the raw description in the convertor. Advance
by the extent and not by the length of the contiguous segment.

This commit was SVN r25550.
2011-11-30 00:14:47 +00:00
George Bosilca
25476c7e54 buffer is not yet initialized, so there is no reason to release it.
This commit was SVN r25549.
2011-11-29 23:50:18 +00:00
George Bosilca
2589e55a75 item_in_tree is only used in debug mode, so protect it.
This commit was SVN r25548.
2011-11-29 23:48:26 +00:00
Jeff Squyres
d71492108c (this is what r25545 should have been)
Per http://www.open-mpi.org/community/lists/users/2011/11/17862.php,
to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS
X, we need to use the following compiler (linker) flag:

    -Wl,-commons,use_dylibs 

So if we're compiling on OS X, test to see if that flag works with the
compiler. If so, add it to the wrapper FFLAGS and FCFLAGS (note that
per a future update, we'll only have one Fortran compiler anyway).

Fixes trac:1982. 

This commit was SVN r25547.

The following SVN revision numbers were found above:
  r25545 --> open-mpi/ompi@7f9ae11faf

The following Trac tickets were found above:
  Ticket 1982 --> https://svn.open-mpi.org/trac/ompi/ticket/1982
2011-11-29 23:28:38 +00:00
Jeff Squyres
6fbbfd0f7a Gah! r25545 acidentally included ''waaaay'' more stuff than it was
supposed to.  I.e., half-baked/not complete stuff.

This commit backs out all of r25545.  Sorry folks!

This commit was SVN r25546.

The following SVN revision numbers were found above:
  r25545 --> open-mpi/ompi@7f9ae11faf
2011-11-29 23:24:52 +00:00
Jeff Squyres
7f9ae11faf Per http://www.open-mpi.org/community/lists/users/2011/11/17862.php,
to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS
X, we need to use the following compiler (linker) flag:

    -Wl,-commons,use_dylibs 

So if we're compiling on OS X, test to see if that flag works with the
compiler.  If so, add it to the wrapper FFLAGS and FCFLAGS (note that
per a future update, we'll only have one Fortran compiler anyway).

Fixes trac:1982.  

This commit was SVN r25545.

The following Trac tickets were found above:
  Ticket 1982 --> https://svn.open-mpi.org/trac/ompi/ticket/1982
2011-11-29 23:05:54 +00:00
George Bosilca
7a238933b6 Silence a compiler warning.
This commit was SVN r25543.
2011-11-29 20:53:08 +00:00
Jeff Squyres
21dc0b44e1 Fix minor typo in comment
This commit was SVN r25542.
2011-11-29 20:39:53 +00:00
Jeff Squyres
96a0f6d78b Sync with 1.4 NEWS.
This commit was SVN r25541.
2011-11-29 20:34:13 +00:00
Jeff Squyres
e21eec785d Sync with NEWS on v1.4 branch
This commit was SVN r25537.
2011-11-29 19:59:42 +00:00
Terry Dontje
5209de048c add code to service_thread_start to handle EBADF returns from select. This commit fixes trac:2922.
This commit was SVN r25520.

The following Trac tickets were found above:
  Ticket 2922 --> https://svn.open-mpi.org/trac/ompi/ticket/2922
2011-11-29 16:49:59 +00:00
Terry Dontje
b1bb339d23 fix r25507 rationalization of rsh support by removing include of plm_base_rsh_support.h from tm module
This commit was SVN r25519.

The following SVN revision numbers were found above:
  r25507 --> open-mpi/ompi@b475421c16
2011-11-29 11:49:41 +00:00
Samuel Gutierrez
375162c693 this commit fixes a few things. 1. silence warning in common sm. 2. remove unneeded config code in common sm. 3. move opal_shmem_base_close to a better place in opal_finalize. 4. fix opal_path_nfs output.
This commit was SVN r25518.
2011-11-28 23:41:19 +00:00
Ralph Castain
0d55a3d739 Missed one spot...
This commit was SVN r25517.
2011-11-28 22:30:53 +00:00
Ralph Castain
237c79b6d7 Fix daemon collectives - missed the one spot where returning orte_routed_tree_t was required. Sigh. Change the routed components to return that type on the list of children when get_routing_tree is called.
This commit was SVN r25516.
2011-11-28 22:24:49 +00:00
George Bosilca
0bd2bf9aae The number of segments accepted should be bounded by MCA_BTL_DES_MAX_SEGMENTS
and not by 2.

This commit was SVN r25515.
2011-11-28 17:19:12 +00:00
Nathan Hjelm
f8c8c641f1 added asserts to warn developers that ob1/csum match fragments do not support more than 2 segments
This commit was SVN r25514.
2011-11-28 16:12:25 +00:00
Samuel Gutierrez
b4edf0ff5c getting ready for 1.5 port of the shared memory enhancements. remove some unused/unneeded stuff and minor style update.
This commit was SVN r25513.
2011-11-28 16:08:32 +00:00
Ralph Castain
89e5bd27a2 Fix copyright date
This commit was SVN r25512.
2011-11-28 15:54:04 +00:00
George Bosilca
5751c45916 Actually ... OMPI is lacking visibility, as it was all moved down in OPAL.
This commit was SVN r25511.
2011-11-28 04:27:06 +00:00
Ralph Castain
70ab8422b1 Per the internal comments, the delay between ssh invocations is not there for debugging purposes, but rather to allow for NIS authetication times. We have seen that problem in the past, so don't just do the delay when we are debugging - use the delay for the intended purpose. Also, allow for shorter than second-level delays as it doesn't always have to be so long.
This commit was SVN r25510.
2011-11-27 01:49:42 +00:00
Ralph Castain
b173316b74 Dont induce a delay between spawns unless specifically asked to do so
This commit was SVN r25509.
2011-11-26 16:50:31 +00:00
Ralph Castain
1062e2c88a Update ignores for new hg versions
This commit was SVN r25508.
2011-11-26 02:34:27 +00:00
Ralph Castain
b475421c16 As promised, rationalize the rsh support. Remove rshbase and the base rsh support, centralizing all rsh support into the rsh component. Remove the "slave" launch support as that experiment is complete. Fix tree spawn and make that the default method for rsh launch, turning it "off" for qrsh as that system does not support tree spawn.
This commit was SVN r25507.
2011-11-26 02:33:05 +00:00
Matthias Jurenz
a841ee2ae7 Updated svn:ignore's
This commit was SVN r25506.
2011-11-25 12:32:17 +00:00
Matthias Jurenz
6b879bcf6e Changes to OTF:
- otfprofile[-mpi]:
                - fixed compile error with the PGI compiler

Changes to VT:
        - added support for LIBC [I/O] tracing on Cray XT platforms
        - vtrun:
                - do preload Dyninst runtime library (DYNINSTAPI_RT_LIB) when
                  instrumenting user functions by Dyninst

This commit was SVN r25505.
2011-11-25 11:51:08 +00:00
Brian Barrett
2bb447c804 * Shouldn't have a timer header for sync_builtin, since it doesn't actually
have timer support
* Default timer size should be a long, not an int.  Int will roll over way
  too fast, with no performance benifit on 64 bit machines...

This commit was SVN r25501.
2011-11-23 17:05:01 +00:00
Brian Barrett
5cd5ef623d Fix compatibility implementation of swap. Turns out that you shouldn't test
the compatibility code on a platform which has a native swap.  Sorry to all!

This commit was SVN r25500.
2011-11-23 16:28:00 +00:00
Brian Barrett
f971a541f1 Implement swap in terms of compare and swap if it isn't implemented directly
This commit was SVN r25499.
2011-11-23 05:57:52 +00:00
Brian Barrett
86f555121c Add (optional/last ditch effort) support for GCC/Intel __sync_ builtin atomic
operations.  Much easier than adding support for a new architecture.

This commit was SVN r25498.
2011-11-23 04:25:41 +00:00
Ralph Castain
9b59d8de6f This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build.
Brian dealt with this in the past by creating platform files and using "no-build" to block the components. This was clunky, but acceptable when only one organization was using that option. However, that number has now expanded to at least two more locations.

Accordingly, make --without-rte-support actually work by adding appropriate configury to prevent components from building when they shouldn't. While doing so, remove two frameworks (db and rmcast) that are no longer used as ORCM comes to a close (besides, they belonged in ORCM now anyway). Do some minor cleanups along the way.

This commit was SVN r25497.
2011-11-22 21:24:35 +00:00
Ralph Castain
cce0949bda Update ignores
This commit was SVN r25496.
2011-11-22 16:38:54 +00:00
Terry Dontje
1f53b32216 This commit fixes trac:2917. By using the cleaned up version of check_visibility that is in the hwloc trunk repo.
This commit was SVN r25495.

The following Trac tickets were found above:
  Ticket 2917 --> https://svn.open-mpi.org/trac/ompi/ticket/2917
2011-11-22 00:01:09 +00:00
George Bosilca
1000af1c48 No need to abort there, returning an error trigger the
abort at the upper level.

This commit was SVN r25494.
2011-11-18 19:07:26 +00:00
Ralph Castain
866edf6a89 Now that George has found his problem, we no longer need the bozo check. Interesting how these platform-specific issues surface...
This commit was SVN r25493.
2011-11-18 17:43:14 +00:00
George Bosilca
b613c7eacb Fix the issue with the round robin mapper. When mixing
different precisions, one should manually promote the
participants to the expected type. In this particular
example as opal_list_get_size returns an unsigned long,
the computation on the left side is translated to an
unsigned. If the hostfile contains more nodes that what
required (via the -np), this leads to a gigantic value 
for the balance, and breaks the round robin algorithm.

This commit was SVN r25492.
2011-11-18 17:03:35 +00:00
Ralph Castain
1e5e9bde77 Add protection against a bozo case where we could end up in an infinite loop while calculating ranks
This commit was SVN r25491.
2011-11-18 15:35:55 +00:00