1
1
Граф коммитов

26809 Коммитов

Автор SHA1 Сообщение Дата
Artem Polyakov
1f7a3a2d54 ompi: Avoid unnecessary PMIx lookups when adding procs (step 2).
Follow-up for 717f3fef62.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-03-16 07:47:27 +07:00
Jeff Squyres
60ca372d60 NEWS: Sync with v2.0.x and v1.10 releases
Pull in content from v1.10 and v2.0.x branches.

[skip ci]
bot:notest

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-15 09:56:31 -07:00
Howard Pritchard
1709febdea Merge pull request #3166 from hppritcha/topic/swat_state_orted_comp_warning
ORTED: swat another compiler warning
2017-03-15 08:40:59 -06:00
Howard Pritchard
1f7378d7e4 Merge pull request #3151 from hppritcha/topic/update_license_file
LICENSE: update according to copyrights in source files
2017-03-15 07:58:27 -06:00
Howard Pritchard
8e4689c2b8 v3.x:updates for branching v3.x
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-14 14:03:47 -06:00
Howard Pritchard
c2da14d514 AUTHORS: update for 3.0.0 branching
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-14 13:54:56 -06:00
Ralph Castain
96d7d10c1d Merge pull request #3170 from rhc54/topic/reg
Ensure the backend daemons know if we are in a managed allocation and if the HNP was included in the allocation
2017-03-14 12:48:09 -07:00
Nathan Hjelm
db9232f8d6 Merge pull request #3169 from hjelmn/btl_ugni_2_0
More btl/ugni updates
2017-03-14 13:23:13 -06:00
Nathan Hjelm
37214eda09 Merge pull request #3164 from hjelmn/ob1_pinned
pml/ob1: do not cache leave_pinned
2017-03-14 13:22:18 -06:00
Mike Dubman
ccac7e5363 Merge pull request #3157 from vspetrov/c_coll_allgather_usage_bugfix
Fixes the coll_allgather usage bug
2017-03-14 18:42:04 +01:00
Ralph Castain
24e8639826 Platform file update
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-14 11:11:48 -06:00
Ralph Castain
955fa0456d Merge pull request #3161 from rhc54/topic/cov2
Silence Coverity warnings
2017-03-14 10:10:11 -07:00
Ralph Castain
61a71e25ef Ensure the backend daemons know if we are in a managed allocation and if
the HNP was included in the allocation

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-14 10:06:43 -07:00
Nathan Hjelm
6b210fa2c4 btl/ugni: do not return a frag from sendi if an endpoint is waitlisted
This fixes a hang that can occur when running bandwidth tests.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 10:14:13 -06:00
Nathan Hjelm
2e42b0afbd btl/ugni: move connection check into sync event
This commit makes datagram checks time based and reduces their
frequency when only the wildcard datagram is posted. This change
improves latency on knl systems.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 10:10:05 -06:00
Nathan Hjelm
3e7ef48c13 pml/ob1: do not cache leave_pinned
This commit fixes a bug that disabled both the RDMA pipeline and RDMA
protocols in ob1. ob1 was internally caching the values of
opal_leave_pinned and opal_leave_pinned_pipeline at init time. This is
no longer valid as opal_leave_pinned may be set by any call to a btl's
add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-14 09:00:40 -06:00
Howard Pritchard
5daaf7f3fd ORTED: swat another compiler warning
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-14 08:41:51 -06:00
Ralph Castain
52c9e631de Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-14 07:30:42 -07:00
Valentin Petrov
fe069c9570 Fixes the coll_allgather usage bug
One should use the correct module object when calling
      c_coll.coll_allgather. Otherwise there will be a segfault in the
      case, for example, when hcoll is used. In that case
      c_coll.coll_allgather = mca_coll_hcoll_allgather while
      c_coll.coll_gather_module = tuned.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2017-03-14 09:47:39 +02:00
Ralph Castain
330b11c8ab Merge pull request #3156 from rhc54/topic/tm
Update the TM module to support regex passing
2017-03-14 00:25:19 -07:00
Ralph Castain
b1a01d77ae Update the TM module to support regex passing
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 21:50:40 -07:00
Nathan Hjelm
9410574253 Merge pull request #3149 from hjelmn/btl_ugni_2_0
Improve multi-threaded RMA performance of the ugni btl
2017-03-13 16:28:41 -06:00
Ralph Castain
e4a35f2dbf Merge pull request #3152 from rhc54/topic/setup
Update launchers to get correct regex
2017-03-13 14:23:43 -07:00
Nathan Hjelm
d5aaeb74b6 btl/ugni: return a descriptor from sendi
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:56:54 -06:00
Nathan Hjelm
a19e7023d1 btl/ugni: always check local SMSG CQ
This commit removes the local operation count check from the local SMSG
completion queue. This check was leading to hangs due to an undocumented
feature of the ugni library. The local SMSG CQ is used to send credit
return messages back to the sender. The ugni library never checks for
the completion itself but relying on the SMSG user to periodically
check the CQ.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:56:54 -06:00
Nathan Hjelm
d5cdeb81d0 btl/ugni: improve multi-threaded performance
This commit updates the ugni btl to make use of multiple device
contexts to improve the multi-threaded RMA performance. This commit
contains the following:

 - Cleanup the endpoint structure by removing unnecessary field. The
   structure now also contains all the fields originally handled by the
   common/ugni endpoint.

 - Clean up the fragment allocation code to remove the need to
   initialize the my_list member of the fragment structure. This
   member is not initialized by the free list initializer function.

 - Remove the (now unused) common/ugni component. btl/ugni no longer
   need the component. common/ugni was originally split out of
   btl/ugni to support bcol/ugni. As that component exists there is no
   reason to keep this component.

 - Create wrappers for the ugni functionality required by
   btl/ugni. This was done to ease supporting multiple device
   contexts. The wrappers are thread safe and currently use a spin
   lock instead of a mutex. This produces better performance when
   using multiple threads spread over multiple cores. In the future
   this lock may be replaced by another serialization mechanism. The
   wrappers are located in a new file: btl_ugni_device.h.

 - Remove unnecessary device locking from serial parts of the ugni
   btl. This includes the first add-procs and module finalize.

 - Clean up fragment wait list code by moving enqueue into common
   function.

 - Expose the communication domain flags as an MCA variable. The
   defaults have been updated to reflect the recommended setting for
   knl and haswell.

 - Avoid allocating fragments for communication with already
   overloaded peers.

 - Allocate RDMA endpoints dyncamically. This is needed to support
   spreading RMA operations accross multiple contexts.

 - Add support for spreading RMA communication over multiple ugni
   device contexts. This should greatly improve the threading
   performance when communicating with multiple peers. By default the
   number of virtual devices depends on 1) whether
   opal_using_threads() is set, 2) how many local processes are in the
   job, and 3) how many bits are available in the pid. The last is
   used to ensure that each CDM is created with a unique id.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:46:06 -06:00
Nathan Hjelm
12bf38a25c btl/ugni: add MPI_T performance variables for ugni counters
This commit exposes ugni statistics for use with MPI_T. There is
no overhead to providing these counters.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-13 14:42:58 -06:00
Jeff Squyres
086748bb70 Merge pull request #3102 from omor1/master
Add missing definition of MPI_T_PVAR_SESSION_NULL (resolve #2652)
2017-03-13 15:27:05 -04:00
Ralph Castain
bb574a41df Update launchers to get correct regex
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 11:21:44 -07:00
Ralph Castain
41d7a5c7d9 Merge pull request #3148 from rhc54/topic/cov
Silence Coverity warnings
2017-03-13 11:12:14 -07:00
Howard Pritchard
fac97a474f LICENSE: update according to copyrights in source files
Update dates in the license file for 3.0.0 branch.

[ci skip]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-13 11:43:58 -06:00
Ralph Castain
105fb152e1 Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 08:38:51 -07:00
Ralph Castain
b9f5cab710 Add a minor debug statement
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-12 18:15:44 -07:00
Gilles Gouaillardet
23d44a5284 sensor/base: initialize orte_sensor_base global variable
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-13 09:39:43 +09:00
Ralph Castain
59bcad5f8e Merge pull request #3146 from rhc54/topic/alps
Update alps module to new APIs
2017-03-12 10:35:29 -07:00
Ralph Castain
6d6bc9bd07 Update alps module to new APIs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-12 09:43:07 -07:00
Ralph Castain
fb27bd1b4a Merge pull request #3143 from rhc54/topic/odls
Enable parallel fork/exec of local procs by providing the option of multiple odls progress threads
2017-03-12 07:29:11 -07:00
Ralph Castain
70591bf4dc Enable parallel fork/exec of local procs by providing the option of multiple odls progress threads
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 20:48:04 -08:00
Ralph Castain
3afadbad89 Merge pull request #3142 from rhc54/topic/sensor
Restore sensor framework
2017-03-11 19:53:45 -08:00
Ralph Castain
ab50665222 Restore sensor framework
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 17:46:32 -08:00
Ralph Castain
74125ecc7a Merge pull request #3141 from rhc54/topic/sync
Sync to latest PMIx master and PMIx reference server
2017-03-11 15:53:18 -08:00
Ralph Castain
c6bc3ccb76 Sync to latest PMIx master and PMIx reference server
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 12:50:38 -08:00
Howard Pritchard
df8df0d2f3 Merge pull request #3137 from hppritcha/topic/swap_rmaps_compiler_warning
rmaps/base: swat compiler warning
2017-03-09 15:06:01 -07:00
Howard Pritchard
f8183f71f7 rmaps/base: swat compiler warning
gcc was complaining about variables possibly used uninitialized

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-09 14:30:06 -06:00
Yossi
1a95633e40 Merge pull request #2717 from alex-mikheev/topic/sshmem_ucx
oshmem: sshmem: adds UCX allocator
2017-03-09 12:58:06 +02:00
Jeff Squyres
16ee880c4e README: Remove coll/ml verbiage
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-08 15:58:54 -05:00
Jeff Squyres
17a34b489b Merge pull request #3121 from jsquyres/pr/master/readme-updates-from-2.x
master: README: sync with v2.x
2017-03-08 12:58:19 -05:00
Yossi
327d5a8ac4 Merge pull request #3125 from alex-mikheev/topic/pml_ucx_req_init_fix
ompi: pml ucx: fix persistant request initialization
2017-03-08 19:08:12 +02:00
Ralph Castain
97287f6568 Merge pull request #2916 from rhc54/topic/sim
Create an alternative mapping method
2017-03-08 07:08:51 -08:00
Jeff Squyres
dc12ae008b Merge pull request #3122 from hjelmn/patcher_madvise
memory/patcher: do not hook madvise
2017-03-08 09:46:45 -05:00