1
1
Граф коммитов

26832 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
086748bb70 Merge pull request #3102 from omor1/master
Add missing definition of MPI_T_PVAR_SESSION_NULL (resolve #2652)
2017-03-13 15:27:05 -04:00
Ralph Castain
bb574a41df Update launchers to get correct regex
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 11:21:44 -07:00
Ralph Castain
41d7a5c7d9 Merge pull request #3148 from rhc54/topic/cov
Silence Coverity warnings
2017-03-13 11:12:14 -07:00
Howard Pritchard
fac97a474f LICENSE: update according to copyrights in source files
Update dates in the license file for 3.0.0 branch.

[ci skip]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-13 11:43:58 -06:00
Ralph Castain
105fb152e1 Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 08:38:51 -07:00
Ralph Castain
b9f5cab710 Add a minor debug statement
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-12 18:15:44 -07:00
Gilles Gouaillardet
23d44a5284 sensor/base: initialize orte_sensor_base global variable
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-13 09:39:43 +09:00
Ralph Castain
59bcad5f8e Merge pull request #3146 from rhc54/topic/alps
Update alps module to new APIs
2017-03-12 10:35:29 -07:00
Ralph Castain
6d6bc9bd07 Update alps module to new APIs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-12 09:43:07 -07:00
Ralph Castain
fb27bd1b4a Merge pull request #3143 from rhc54/topic/odls
Enable parallel fork/exec of local procs by providing the option of multiple odls progress threads
2017-03-12 07:29:11 -07:00
Ralph Castain
70591bf4dc Enable parallel fork/exec of local procs by providing the option of multiple odls progress threads
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 20:48:04 -08:00
Ralph Castain
3afadbad89 Merge pull request #3142 from rhc54/topic/sensor
Restore sensor framework
2017-03-11 19:53:45 -08:00
Ralph Castain
ab50665222 Restore sensor framework
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 17:46:32 -08:00
Ralph Castain
74125ecc7a Merge pull request #3141 from rhc54/topic/sync
Sync to latest PMIx master and PMIx reference server
2017-03-11 15:53:18 -08:00
Ralph Castain
c6bc3ccb76 Sync to latest PMIx master and PMIx reference server
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 12:50:38 -08:00
Howard Pritchard
df8df0d2f3 Merge pull request #3137 from hppritcha/topic/swap_rmaps_compiler_warning
rmaps/base: swat compiler warning
2017-03-09 15:06:01 -07:00
Howard Pritchard
f8183f71f7 rmaps/base: swat compiler warning
gcc was complaining about variables possibly used uninitialized

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-09 14:30:06 -06:00
Yossi
1a95633e40 Merge pull request #2717 from alex-mikheev/topic/sshmem_ucx
oshmem: sshmem: adds UCX allocator
2017-03-09 12:58:06 +02:00
Jeff Squyres
16ee880c4e README: Remove coll/ml verbiage
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-08 15:58:54 -05:00
Jeff Squyres
17a34b489b Merge pull request #3121 from jsquyres/pr/master/readme-updates-from-2.x
master: README: sync with v2.x
2017-03-08 12:58:19 -05:00
Yossi
327d5a8ac4 Merge pull request #3125 from alex-mikheev/topic/pml_ucx_req_init_fix
ompi: pml ucx: fix persistant request initialization
2017-03-08 19:08:12 +02:00
Ralph Castain
97287f6568 Merge pull request #2916 from rhc54/topic/sim
Create an alternative mapping method
2017-03-08 07:08:51 -08:00
Jeff Squyres
dc12ae008b Merge pull request #3122 from hjelmn/patcher_madvise
memory/patcher: do not hook madvise
2017-03-08 09:46:45 -05:00
Alex Mikheev
c081239f88
ompi: pml ucx: fix persistant request init CR changes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-08 13:26:29 +02:00
Alex Mikheev
c113c37a7a
ompi: pml ucx: fix persistant request initialization
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-08 10:59:41 +02:00
Ralph Castain
48fc339718 Create an alternative mapping method that pushes responsibility
onto the backend daemons. By default, let mpirun only pack the app_context
info and send that to the backend daemons where the mapping will
be done. This significantly reduces the computational time on mpirun as it isn't
running up/down the topology tree computing thousands of binding
locations, and it reduces the launch message to a very small number of
bytes.

When running -novm, fall back to the old way of doing things
where mpirun computes the entire map and binding, and then sends
the full info to the backend daemon.

Add a new cmd line option/mca param --fwd-mpirun-port that allows
mpirun to dynamically select a port, but then passes that back to
all the other daemons so they will use that port as a static port
for their own wireup. In this mode, we no longer "phone home" directly
to mpirun, but instead use the static port to wireup at daemon
start. We then use the routing tree to rollup the initial
launch report, and limit the number of open sockets on mpirun's node.

Update ras simulator to track the new nidmap code

Cleanup some bugs in the nidmap regex code, and enhance the error message for not enough slots to include the host on which the problem is found.

Update gadget platform file

Initialize the range count when starting a new range

Fix the no-np case in managed allocation

Ensure DVM node usage gets cleaned up after each job

Update scaling.pl script to use --fwd-mpirun-port. Pre-connect the daemon to its parent during launch while we are otherwise waiting for the daemon's children to send their "phone home" rollup messages

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-07 20:43:12 -08:00
Nathan Hjelm
3caeda21dc memory/patcher: do not hook madvise
It is not possible to hook madvise at this time due to a deadlock when
using glibc.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-07 16:26:53 -07:00
Jeff Squyres
3a6b297bd5 README: sync with v2.x
The README on master had grown very, very stale.  This commit copies
the README from the tip of the v2.x branch (from
https://github.com/open-mpi/ompi/pull/3119) and preserves a few minor
differences between master and the v2.x branch.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

[skip ci]
bot:notest
2017-03-07 18:08:26 -05:00
Nathan Hjelm
7240bee0e0 Merge pull request #3110 from hjelmn/osc_pt2pt
osc/pt2pt: flush pending fragments on lock ack
2017-03-07 14:44:09 -07:00
Joshua Ladd
e2ba60b778 Merge pull request #3111 from jladd-mlnx/topic/cx5-device-param
Adding latest ConnectX-5 adapter vendor part id to OpenIB device params.
2017-03-07 13:55:46 -05:00
Nathan Hjelm
15ea9c5524 Merge pull request #3013 from hjelmn/rcache_lifo
rcache/base: do not free memory with the vma lock held
2017-03-07 09:11:04 -07:00
Jeff Squyres
c2adf359cf Merge pull request #3083 from ggouaillardet/topic/hwloc_v15
hwloc: add support for hwloc v1.5
2017-03-07 10:01:24 -05:00
Joshua Ladd
b28647857f Adding latest ConnectX-5 adapter vendor part id to OpenIB device params.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2017-03-07 00:19:54 +02:00
Nathan Hjelm
0195d15401 osc/pt2pt: flush pending fragments on lock ack
This commit addresses an issue that can occur in cases where a lot of
fragments are outstanding.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-06 13:58:46 -07:00
Ralph Castain
79540fec08 Merge pull request #3104 from rhc54/topic/minor
Fix some minor compatibility issues
2017-03-06 10:14:22 -08:00
Edgar Gabriel
607dc2c039 Merge pull request #3103 from edgargabriel/pr/sharedfp-name-collision-fix
sharedfp/lockedfile and sm: fix the namecollision
2017-03-05 14:46:20 -06:00
Ralph Castain
aca7091114 Fix some minor compatibility issues by ensuring job-level data gets stored against wildcard rank in the cray, s1, and s2 components, and that the ext1 component translates all wildcard rank requests into the peer's rank since v1.x of PMIx doesn't understand wildcard ranks
Closes #3101

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-05 10:30:59 -08:00
Edgar Gabriel
2d462b3b80 sharedfp/lockedfile and sm: fix name collision
this fixes the issue reported by Nicolas Joly on the mailing: the sharedfp/lockedfile component does not support right now a scenario where multiple jobs read from the same input file, due to a collision of the filenames utilized for the sharedfp handle. Although not part of the oroginal report, the same occurs for the sharedfp/sm component. Add therefore the jobid to be part of the lockedfilename/sm file name.

use the OMPI_CAST_RTE_NAME macro to determine jobid

Fixes: #3098

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-03-05 11:28:28 -06:00
Omri Mor
20ab37a297 Add missing MPI_T_PVAR_SESSION_NULL to mpi.h
MPI_T_pvar_session_free() should reject null sessions and set *session to MPI_T_PVAR_SESSION_NULL

Signed-off-by: Omri Mor <omri50@gmail.com>
2017-03-05 09:03:30 -06:00
Mike Dubman
2f8c759b73 Merge pull request #3100 from artpol84/fix_ucx_req/master
ompi/pml/ucx: Fix uninitialized UCX request field.
2017-03-05 08:53:18 +01:00
Artem Polyakov
9448814c40 ompi/pml/ucx: Fix uninitialized UCX request field.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-03-05 03:06:30 +07:00
Edgar Gabriel
d1fed77781 Merge pull request #3094 from edgargabriel/pr/master-lustre-priority
io/ompio: adjust the priority of the OMPIO component on lustre
2017-03-03 09:29:14 -06:00
KAWASHIMA Takahiro
39294caf04 Merge pull request #3086 from kawashima-fj/pr/coll-base-defs
coll: Update `ompi/mca/coll/base/coll_base_functions.h`
2017-03-03 18:53:00 +09:00
KAWASHIMA Takahiro
7cb42d9aaa Merge pull request #3085 from kawashima-fj/pr/pml-bfo-typo
pml/bfo: Correct a function name and header filenames
2017-03-03 18:48:01 +09:00
Ralph Castain
ddf662ab74 Merge pull request #3097 from rhc54/topic/error
Silence an unnecessary error log
2017-03-02 19:04:44 -08:00
Ralph Castain
1de72ff023 Silence an unnecessary error log
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-02 17:18:34 -08:00
Ralph Castain
a7d42e5f30 Merge pull request #3096 from rhc54/topic/psec
Remove the stale opal/sec framework
2017-03-02 17:15:51 -08:00
Gilles Gouaillardet
7e01be60d9 hwloc: add support for hwloc v1.5
hwloc v1.5 does not support HWLOC_OBJ_OSDEV_COPROC
nor hwloc_topology_dup(), so for this version :
- do not search for coprocessors
- do not try hwloc_topology_dup(), note this is not
  used anywhere in the code base

Thanks Jeff for helping with the wording

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-03 09:39:24 +09:00
Ralph Castain
83199979ba Remove the stale opal/sec framework
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-02 15:41:56 -08:00
Edgar Gabriel
9e19834327 io/ompio: adjust the priority of the OMPIO component on lustre
this commit brings over the behavior from the 2.x series to master, mostly with the fork for the 3.x series in mind.
Also, use strncasecmp instead of two strncmps

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-03-02 12:10:11 -06:00