Gilles Gouaillardet
7d6b75f3b2
orte_util_snprintf_jobid: return ORTE_SUCCESS or ORTE_ERROR
2016-01-18 09:44:33 +09:00
Edgar Gabriel
a9ca37059a
improve the communicaton abstraction. This commit also allows all aggregators to work simultaniously, instead of the slightly staggered way of the previous version.
2016-01-17 09:48:49 -06:00
Edgar Gabriel
56e11bfc97
initialize the stripe_size variable as well.
2016-01-17 09:48:49 -06:00
Edgar Gabriel
26c57ef374
separate the size of the buffer used for the shuffle step and the size of the buffer used for a pwritev operation.
2016-01-17 09:48:49 -06:00
Edgar Gabriel
39d5c8c281
further bug fixes silencing a compiler warning and fixing a memory overrun
2016-01-17 09:48:49 -06:00
Edgar Gabriel
2bcae84e11
further debugging
2016-01-17 09:48:49 -06:00
Edgar Gabriel
2bdd6ba17a
correctly free some buffers, and ensure that lustre_stripe_size and stripe_count are always read from the file system.
2016-01-17 09:48:49 -06:00
Edgar Gabriel
4bbb22bd0b
add a new field to the ompio data structure (stripe_count) and set it correctly on pvfs2 and lustre.
2016-01-17 09:48:49 -06:00
Edgar Gabriel
d282e94b67
add the new dynamic_gen2 component, designed to coexist for now with the original dynamic component
2016-01-17 09:48:49 -06:00
rhc54
b172b8599b
Merge pull request #1285 from ggouaillardet/topic/pmix_dist_fix
...
pmix: do not include automatically generated include/private/autogen/…
2016-01-16 20:49:41 -08:00
Ralph Castain
fc6b260146
Protect against PMIx-based requests that don't come thru the MPI comm_spawn interface
2016-01-16 13:36:06 -08:00
Ralph Castain
4dad5de8ff
Silence a couple of warnings - strncpy returns a char*, not an int
2016-01-16 09:44:52 -08:00
Jeff Squyres
348ac507c2
usnic: explain why we still have OPAL_HAVE_HWLOC
...
Put in a comment explaining why btl_usnic_compat.h still defines
OPAL_HAVE_HWLOC, even though master/v2.x no longer does.
2016-01-16 04:11:05 -08:00
Jeff Squyres
0f5fcf9029
usnic: fix common symbol
2016-01-16 03:55:27 -08:00
Jeff Squyres
6c96cb1ad0
find_common_syms: arrgh -- re-add the x bit
...
Previous commit accidentally removed the x bit from this script. This
commit puts it back.
2016-01-16 03:53:43 -08:00
Jeff Squyres
60ffe713b8
common syms: whitelist bison-generated common symbols
...
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-01-16 03:53:14 -08:00
Jeff Squyres
96f94f8228
fortran: whitelist deliberate common symbols
...
The Fortran library has a number of common symbols that are
deliberate, so whitelist them.
2016-01-16 03:53:14 -08:00
Jeff Squyres
c43d4fd915
find_common_syms: trivial updates
...
Look for "common_sym_whitelist.txt" files (not
"common_sym_whitelist"). Also, skip blank lines in the
whitelistfiles, too.
2016-01-16 03:53:14 -08:00
rhc54
ef24f710a7
Merge pull request #1303 from timattox/remove_unused_var
...
hwloc_base_util.c: Remove newly unused variable 'i'.
2016-01-15 00:35:14 -08:00
Tim Mattox
958de82471
hwloc_base_util.c: Remove newly unused variable 'i'.
2016-01-14 16:35:47 -05:00
Joshua Ladd
18c5a21562
Fix typo in error handling flow.
2016-01-14 22:28:54 +02:00
Joshua Ladd
afa62d8ca1
Addressing reviewers' comments for https://github.com/open-mpi/ompi-release/pull/891
2016-01-14 19:22:27 +02:00
Gilles Gouaillardet
1d38430e43
opal: replace opal_convert_jobid_to_string with opal_snprintf_jobid
2016-01-14 10:39:03 +09:00
Jeff Squyres
e5cf2db3b7
Merge pull request #1291 from jsquyres/pr/hotel-fix
...
opal hotel: only delete events that have not yet fired
2016-01-13 14:51:51 -05:00
Jeff Squyres
270cc11156
opal hotel: only delete events that have not yet fired
...
The eviction callback, for convenience (and to avoid code
duplication), use to call opal_hotel_checkout(). However,
opal_hotel_checkout() deletes the eviction event -- which is fine to
do when opal_hotel_checkout() is invoked by the application. But when
it's invoked by the same event that it's deleting, it can cause Bad
Things to happen.
For simplicity, instead of invoking opal_hotel_checkout() from the
eviction callback, just duplicate the checkout logic into the eviction
callback function (and skip the delete-the-evict-event part).
For good measure, put a comment in all three places where the checkout
logic occurs (because it's inlined): don't change this logic without
changing all 3 places.
Finally, also add a line in the docs for opal_hotel_init() warning
users from calling opal_hotel_checkout() from their eviction
callback.
2016-01-13 10:59:06 -08:00
rhc54
eb65b5f97e
Merge pull request #1297 from timattox/use_hwloc_bitmap_weight
...
Replace a bit counting loop with an efficient population count
2016-01-13 09:05:13 -08:00
rhc54
26e882c1c3
Merge pull request #1300 from ggouaillardet/topic/oversubscribe
...
orte_rmaps_base_map_job: set OPAL_BIND_ALLOW_OVERLOAD when needed
2016-01-13 09:00:02 -08:00
Gilles Gouaillardet
4c43fb2a50
orte_rmaps_base_map_job: set OPAL_BIND_ALLOW_OVERLOAD when needed
2016-01-13 17:13:36 +09:00
Tomislav Janjusic
3858bc8e62
Adding support for dynamic endpoint creation
...
Signed-off-by: Tomislav Janjusic <tomislavj@mngx-apl-01.mtl.labs.mlnx>
Signed-off-by: Tomislavj Janjusic <tomislavj@mellanox.com>
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-12 22:17:03 +02:00
Nathan Hjelm
dd4d49cbbb
Merge pull request #1278 from ggouaillardet/poc/osc_pt2pt
...
osc/pt2pt: use two distinct "namespaces" for tags
2016-01-12 09:49:31 -07:00
Tim Mattox
f2d4a8d266
Replace a bit counting loop with a call to an efficient population count routine
2016-01-12 10:48:56 -05:00
Gilles Gouaillardet
955fe85cb6
pmix/pmix120: add missing include file
2016-01-12 11:35:32 +09:00
Nathan Hjelm
b6366e52a8
Merge pull request #1294 from hjelmn/group_fix
...
ompi/group: do not decrement parent group proc pointers in destruct
2016-01-11 13:49:23 -07:00
Nathan Hjelm
d26cc3fece
ompi/group: do no decrement parent group proc pointers in destruct
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-11 12:56:11 -07:00
Ralph Castain
332019b43a
Silence warning
2016-01-10 09:59:36 -08:00
Nathan Hjelm
aefc2ed3e8
Merge pull request #1283 from artpol84/udcm_race_fix
...
Fix race condition in UDCM
2016-01-09 08:19:20 -07:00
Gilles Gouaillardet
0fb7b07a71
opal/progress: fix non debug builds
...
this bug was introduced in open-mpi/ompi@64b695669a
Thanks Pavel (Pasha) Shamis for reporting this issue
2016-01-09 15:47:40 +09:00
Artem Polyakov
84e4fb308b
Fix race condition in UDCM where service thread sees that
...
`cm_message_event_active == 1` but main thread has already stopped
processing messages and thus we will have the situation where one
message was left unhandled leading to a hang.
2016-01-08 23:56:21 +06:00
Gilles Gouaillardet
73daf58ee5
pmix: do not include automatically generated include/private/autogen/config.h into dist tarball
...
Thanks Siegmar Gross for the initial report of this issue
2016-01-08 13:18:15 +09:00
Edgar Gabriel
ac34c0ec51
Merge pull request #1287 from edgargabriel/posix-fbtl-update
...
use the actual preadv and pwritev functions if available. That's what…
2016-01-07 19:48:51 -06:00
Nathan Hjelm
faeca5663c
Merge pull request #1289 from hjelmn/hwloc_fix
...
Update hwloc to 1.11.2 + Fix /proc/mounts issue.
2016-01-07 16:14:55 -07:00
Nathan Hjelm
15007b4e2b
linux: use mntent.h instead of manually parsing /proc/mounts
...
setmntent() doesn't support root_fd, but manual parsing of
/proc/mounts is fragile, and actually buggy for very long mount lines
(see open-mpi/hwloc#142 (comment)).
Since we only openat("/proc/mounts") there, just manually concatenate
the fsroot_path and use setmntent().
Thanks to Nathan Hjelm for the report.
(Cherry-picked from open-mpi/hwloc@d2d07b9a22 )
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-07 12:55:03 -07:00
Nathan Hjelm
1384559fcd
Update hwloc to v1.11.2
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-07 12:33:12 -07:00
rhc54
73ca19581e
Merge pull request #1288 from hjelmn/grpcomm_fixes
...
grpcomm: fix bugs in grpcomm algorithms
2016-01-07 10:24:03 -08:00
Nathan Hjelm
fab1eca536
grpcomm: fix bugs in grpcomm algorithms
...
This commit fixes multiple issues in the bruck's and recursive
doubling grpcomm algorithms. The following changes are included:
- Use the existing bitmap implementation instead of implementing a
new one. There were bugs in the implementation that caused an
overrun of the bitmap array.
- Clean up the algorithms to eliminate errors.
- Send as little extra data as possible in the bruck's
algorithm.
The changes were testest with various numbers of ortes varying from 1
to 4096.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-07 10:12:08 -07:00
Edgar Gabriel
0a1b735eed
use the actual preadv and pwritev functions if available. That's what the fbtl interfaces have been designed for.
2016-01-07 08:29:17 -06:00
Nysal Jan K.A
13f9bb9202
Use PMI2 constants for consistency
2016-01-07 11:46:22 +05:30
Gilles Gouaillardet
713e3ea2e5
configury: fix pthread_join() call in OPAL_INTL_PTHREAD_TRY_LINK_FORTRAN
2016-01-07 10:20:20 +09:00
Gilles Gouaillardet
4c1ea4a171
dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept()
...
this commit includes missing bits from open-mpi/ompi@213b2abde4
2016-01-07 09:11:03 +09:00
Gilles Gouaillardet
213b2abde4
dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept()
2016-01-06 16:21:13 +09:00