1
1

2427 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
3c8aa7c296 Don't just hardcode the max length of the PMI name as it could be wrong. PMI2 installations seem to be retaining at least some of the PMI functions, so use the one to get the max name length.
This commit was SVN r28962.
2013-07-30 14:13:15 +00:00
Nathan Hjelm
99adeb7f6e Fix support for complex datatypes when fortran is not available but _Complex is
This commit was SVN r28951.
2013-07-25 19:08:21 +00:00
Nathan Hjelm
ebbb32120a MCA/base: variable system updates
- Use an enumerator to handle bool values.

 - Fix a leak in the variable enumerator.

 - Fix a leak in an orte parameter.

This commit was SVN r28949.
2013-07-25 15:42:01 +00:00
Ralph Castain
41f97931e9 Need to include module-level CPPFLAGS so it can build
This commit was SVN r28947.
2013-07-24 23:07:43 +00:00
Nathan Hjelm
c4c69b4ddf MPI-3: add support for large counts using derived datatypes
Add support for MPI_Count type and MPI_COUNT datatype and add the required
MPI-3 functions MPI_Get_elements_x, MPI_Status_set_elements_x,
MPI_Type_get_extent_x, MPI_Type_get_true_extent_x, and MPI_Type_size_x.
This commit adds only the C bindings. Fortran bindins will be added in
another commit. For now the MPI_Count type is define to have the same size
as MPI_Offset. The type is required to be at least as large as MPI_Offset
and MPI_Aint. The type was initially intended to be a ssize_t (if it was
the same size as a long long) but there were issues compiling romio with
that definition (despite the inclusion of stddef.h).

I updated the datatype engine to use size_t instead of uint32_t to support
large datatypes. This will require some review to make sure that 1) the
changes are beneficial, 2) nothing was broken by the change (I doubt
anything was), and 3) there are no performance regressions due to this
change.

Increase the maximum number of predifined datatypes to support MPI_Count

Put common get_elements code to ompi/datatype/ompi_datatype_get_elements.c

Update MPI_Get_count to reflect changes in MPI-3 (return MPI_UNDEFINED when the count is too large for an int)

This commit was SVN r28932.
2013-07-23 15:35:14 +00:00
Ralph Castain
6c1a140e99 Per request from Nathan, add a "commit" API to the opal db framework. This allows him to aggregate keys to work around the Cray's severe PMI limitations
This commit was SVN r28917.
2013-07-22 22:57:16 +00:00
Jeff Squyres
49b5342130 After talking with Nathan, update some comments/documentation about
the new MCA var and pvar systems.

This commit was SVN r28913.
2013-07-22 20:34:42 +00:00
Nathan Hjelm
61d331d5b5 MCA/base: fix some warnings and an error in the MCA variable system
This commit was SVN r28909.
2013-07-22 17:52:39 +00:00
Brian Barrett
0d8b57211a add missing include
This commit was SVN r28900.
2013-07-21 20:18:17 +00:00
Nathan Hjelm
1e8ba2b8cf fix condition in common/pmi init that c caused pmi to fail if PMI2_Init succeeds
This commit was SVN r28856.
2013-07-19 02:43:42 +00:00
Ralph Castain
4eb0dfa039 This has apparently been wrong for some time! Fix the common/pmi libraries so we build them dynamic so they can be properly linked into the components that use them. Define required library version numbers and so some other cuteness to make it all work.
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r28842.
2013-07-18 18:42:42 +00:00
Ralph Castain
92c6b806b9 Based on a patch submitted by Piotr Lesnicki of Bull, cleanup the PMI2 support. This has not been tested yet on multiple environments (e.g., Cray), so it needs more evaluation prior to moving to the 1.7 branch.
cmr:v1.7.3:reviewer=rhc

This commit was SVN r28837.
2013-07-18 14:46:07 +00:00
Nathan Hjelm
b88509af36 don't close components that failed to register. cmr:v1.7:reviewer=rhc
This commit was SVN r28823.
2013-07-17 19:49:05 +00:00
Nathan Hjelm
456de007a8 ignore unavailable components when registering
This commit was SVN r28802.
2013-07-16 16:02:33 +00:00
Nathan Hjelm
d446675526 MCA: Per-RFC, add support for performance variables
This commit adds an API for registering and querying performance
variables (mca_base_pvar) in the MCA base. The existing MCA variable
system API has been updated to reflect the new API: MCA variable
groups have performance variables, and new types have been added (double,
unsigned long long) to reflect what is required by the MPI_T
interface. Additionally, the MCA variable group code has been split
into its own set of files: mca_base_var_group.[ch].

Details of the new API can be found in doxygen comments in the header:
mca_base_pvar.h.

Other changes to the variable system:

 - Use an opal_hash_table to speed up variable/group lookup.

 - Clean up code associated with MCA variable types.

 - Registered performance variables are printed by ompi_info -a. In the
   future an option should be added to control this behavior.

Changes to OMPI:

 - Added full support for the MPI_T performance variable interface.

This commit was SVN r28800.
2013-07-16 16:02:13 +00:00
Ralph Castain
10ca1c1b04 Turns out that there was exactly ONE place in all of the OMPI code base that still referred to OPAL_TRACE, though a few places retained the include file for no reason. So no point in letting this sit as it is clearly an unused "feature".
This commit was SVN r28789.
2013-07-14 18:57:20 +00:00
Jeff Squyres
14424daf4c Remove auto-generated file
This commit was SVN r28784.
2013-07-13 20:55:09 +00:00
Nathan Hjelm
8f9b7926ec mca/base: fix component selection negation. cmr:v1.7:reviewer=jsquyres
This commit was SVN r28770.
2013-07-12 17:55:20 +00:00
Ralph Castain
b001d31c27 Per RFC, remove libevent 2.0.19 and leave 2.0.21 as the default
This commit was SVN r28767.
2013-07-12 16:37:15 +00:00
Jeff Squyres
9252afdcd9 Updates and tweaks to the documentation of the new MCA parameter
system (written in conjunction with Nathan).

This commit was SVN r28758.
2013-07-11 20:04:51 +00:00
Jeff Squyres
bdb45a2e4f Add an oh-so-slightly faster variant of the hotel "checkin" action
(since this is used in the fast path) for when you ''know'' that there
will be a room available:

 * Don't do the last_unoccupied_room check
 * Return void

This commit was SVN r28757.
2013-07-11 20:00:37 +00:00
Nathan Hjelm
a694bcb6b6 Add support for the MCA variable information level to ompi_info.
Add an option to ompi_info (-l, --level) that takes a number in the
interval (1,9). Only MCA variables up to this level will be printed.
The default level is 1.

Print the level as part of both the parsable and readable output.

This commit was SVN r28750.
2013-07-10 18:52:36 +00:00
Ralph Castain
028f5ee7a6 Cleanup some bitrot from moving the db framework to opal and from the new mca param system
This commit was SVN r28741.
2013-07-09 14:37:08 +00:00
Ralph Castain
315da8125d Remove stale headers
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r28732.
2013-07-08 18:26:58 +00:00
Ralph Castain
2ccc0438af On some systems, pthread_kill is actually in the "signals.h" header, so include it
This commit was SVN r28731.
2013-07-08 17:40:38 +00:00
Ralph Castain
eac174e624 For purposes of testing the RFC, make libevent2021 the default for now so it gets tested by MTT
This commit was SVN r28730.
2013-07-05 23:14:22 +00:00
Brian Barrett
ea9cee73c1 Per RFC, remove darwin backtrace, since OS X since 10.5 has supported the
execinfo() interface (which has been the default for OMPI to use on Darwin)

This commit was SVN r28727.
2013-07-05 19:06:27 +00:00
Ralph Castain
21c8041a40 Update libevent 2021 component so it also only warns once when detecting reentrant behavior
This commit was SVN r28721.
2013-07-04 04:41:04 +00:00
Ralph Castain
bd65937bf3 If we enable ipv6, we resolve a hosts addresses and check them all against our local interfaces to determine if the given host is us. However, if we don't enable ipv6, we only checked the first address returned. This can cause us to incorrectly identify a hostname as "not us".
Make -disable-ipv6 behave the same as --enable-ipv6 by checking all the returned addresses.

This commit was SVN r28716.
2013-07-03 21:41:36 +00:00
Ralph Castain
45fad1ddcc We really should be closing the event framework when told to do so.
cmr:v1.7.3,reviewer=jsquyres

This commit was SVN r28714.
2013-07-03 16:57:14 +00:00
Ralph Castain
9166a8cc95 Per telecon today, add a flag so we only warn once about reentrant libevent loops - this will allow developers to better diagnose the problem as we won't swamp filesystems with warning messages.
This commit was SVN r28712.
2013-07-03 04:51:36 +00:00
Jeff Squyres
ad16bcd6d1 Followup from Justin Bronder: Looks like I spoke too soon. The
sandbox team has informed me that they are getting rid of SANDBOX_PID
in the future and that using SANDBOX_ON would be preferred.

This commit was SVN r28708.
2013-07-03 01:38:26 +00:00
Jeff Squyres
fea15ec34e Add memory hooks override for Gentoo sandbox v2.5, too. Thanks to
Justin Bronder for the patch.

This commit was SVN r28702.
2013-07-02 12:34:51 +00:00
George Bosilca
a5bda43cfc Small typo.
This commit was SVN r28689.
2013-07-01 16:48:45 +00:00
Ralph Castain
446e33a5d8 There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes.
To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it.

This commit was SVN r28682.
2013-06-27 03:04:50 +00:00
Jeff Squyres
dd25421d48 Convert strcpy() to strncpy(), and just to be extra-super paranoid,
use memset(0) for extra bonus points.

This commit was SVN r28668.
2013-06-22 12:21:18 +00:00
Joshua Ladd
0b5c1f2ea8 Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28666.
2013-06-21 15:28:14 +00:00
George Bosilca
f5a55ccb39 Various cleanups.
This commit was SVN r28647.
2013-06-15 16:23:11 +00:00
George Bosilca
a6c3477e89 Remove useless include.
This commit was SVN r28646.
2013-06-15 16:07:30 +00:00
Nathan Hjelm
8924140916 Per RFC: use a better hash algorithm for the opal_hash_table_*_ptr functions.
Chose the crc32 function present in opal/util/crc.c as the hash function. The
performance should be sufficient for most cases. If not we can always change
the function again.

This commit was SVN r28629.
2013-06-13 17:11:04 +00:00
Nathan Hjelm
518d1fe200 Fix two typos that prevented alps direct launch from working
This commit was SVN r28628.
2013-06-13 17:04:08 +00:00
Joshua Ladd
46362d2761 Stomps compiler warnings in HCA min-dist calculation. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28620.
2013-06-12 16:25:25 +00:00
Tom Naughton
d86c3ce669 + remove autogenerated 'install-sh'
This commit was SVN r28602.
2013-06-07 20:40:24 +00:00
Rolf vandeVaart
62ab008017 Fix SEGV because missing CUDA initialization.
This commit was SVN r28601.
2013-06-07 18:31:36 +00:00
Rolf vandeVaart
1230029aa1 The debug messages were swapped. Fixed.
This commit was SVN r28600.
2013-06-07 17:23:41 +00:00
George Bosilca
72877f078f Based on the MPI 3.0 count equal to zero has a clear meaning, no modification
of the original datatype are allowed (not in type map nor extent). Make it
clear in the code.
Allow 0-count cases to the contiguous memory check.

This commit was SVN r28568.
2013-05-29 16:02:54 +00:00
Jeff Squyres
6d173af329 This commit introduces a new "mindist" ORTE RMAPS mapper, as well as
some relevant updates/new functionality in the opal/mca/hwloc and
orte/mca/rmaps bases.  This work was mainly developed by Mellanox,
with a bunch of advice from Ralph Castain, and some minor advice from
Brice Goglin and Jeff Squyres.

Even though this is mainly Mellanox's work, Jeff is committing only
for logistical reasons (he holds the hg+svn combo tree, and can
therefore commit it directly back to SVN).

-----

Implemented distance-based mapping algorithm as a new "mindist"
component in the rmaps framework.  It allows mapping processes by NUMA
due to PCI locality information as reported by the BIOS - from the
closest to device to furthest.

To use this algorithm, specify:

   {{{mpirun --map-by dist:<device_name>}}}

where <device_name> can be mlx5_0, ib0, etc.

There are two modes provided:

 1. bynode: load-balancing across nodes
 1. byslot: go through slots sequentially (i.e., the first nodes are
     more loaded)

These options are regulated by the optional ''span'' modifier; the
command line parameter looks like:

    {{{mpirun --map-by dist:<device_name>,span}}}

So, for example, if there are 2 nodes, each with 8 cores, and we'd
like to run 10 processes, the mindist algorithm will place 8 processes
to the first node and 2 to the second by default. But if you want to
place 5 processes to each node, you can add a span modifier in your
command line to do that.

If there are two NUMA nodes on the node, each with 4 cores, and we run
6 processes, the mindist algorithm will try to find the NUMA closest
to the specified device, and if successful, it will place 4 processes
on that NUMA but leaving the remaining two to the next NUMA node.

You can also specify the number of cpus per MPI process. This option
is handled so that we map as many processes to the closest NUMA as we
can (number of available processors at the NUMA divided by number of
cpus per rank) and then go on with the next closest NUMA.

The default binding option for this mapping is bind-to-numa. It works
if you don't specify any binding policy. But if you specified binding
level that was "lower" than NUMA (i.e hwthread, core, socket) it would
bind to whatever level you specify.

This commit was SVN r28552.
2013-05-22 13:04:40 +00:00
Jeff Squyres
55382c1bf8 Bring over upstream hwloc trunk commit
https://svn.open-mpi.org/trac/hwloc/changeset/5592 to fix the merging
of groups when they are I/O objects.

This commit was SVN r28551.
2013-05-22 12:34:59 +00:00
Nathan Hjelm
721779d7ab Per RFC: remove old MCA parameter system.
This commit was SVN r28541.
2013-05-20 15:36:13 +00:00
Jeff Squyres
089c632cce Remove a bunch of dead code: gcc 4.7 warns of set-but-unused
variables.  So get rid of them.

This commit was SVN r28538.
2013-05-17 21:45:49 +00:00