1
1
Граф коммитов

28236 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
17c40f4cea Implement support for proctable queries
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-02 02:00:31 -08:00
Ralph Castain
0434b615b5 Update ORTE to support PMIx v3
This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on":

* initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch

* IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet.

* Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later.

* Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-02 02:00:31 -08:00
Gilles Gouaillardet
f15d6200af
Merge pull request #4832 from ggouaillardet/topic/vader_process_vm
btl/vader: handle unexpected short read/write in process_vm_{read,write}v
2018-03-02 13:59:51 +09:00
Gilles Gouaillardet
9fedf2836e btl/vader: handle unexpected short read/write in process_vm_{read,write}v
Important note :

According to the man page
"On success, process_vm_readv() returns the number of bytes read and
process_vm_writev() returns the number of bytes written.  This return
value may be less than the total number of requested bytes, if a
partial read/write occurred.  (Partial transfers apply at the
granularity of iovec elements.  These system calls won't perform a
partial transfer that splits a single iovec element.)"

So since we use a single iovec element, the returned size should either
be 0 or size, and the do loop should not be needed here.
We tried on various Linux kernels with size > 2 GB, and surprisingly,
the returned value is always 0x7ffff000 (fwiw, it happens to be the size
of the larger number of pages that fits a signed 32 bits integer).
We do not know whether this is a bug from the kernel, the libc or even
the man page, but for the time being, we do as is process_vm_readv() could
return any value.

Thanks Heiko Bauke for the bug report.

Refs. open-mpi/ompi#4829

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-03-02 13:20:46 +09:00
Nathan Hjelm
5ed2fc2d48 mca/base: add support for additional variable types
This commit adds long, int32_t, uint32_t, int64_t, and uint64_t as
possible MCA variable types.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-03-01 20:42:27 -07:00
Josh Hursey
6f589546d3
Merge pull request #4867 from sam6258/prefix_dir_fix
Fix PATH and LD_LIBRARY_PATH prefixing to use first app context value…
2018-03-01 10:25:04 -06:00
Scott Miller
d7e594fcff Fix PATH and LD_LIBRARY_PATH prefixing to use first app context value for ORTE_APP_PREFIX_DIR
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
2018-02-28 18:41:47 -05:00
bosilca
9944d63de1
Merge pull request #4852 from thananon/pr/ob1_oos_fix
pml/ob1: fixed out of sequence bug.
2018-02-28 13:02:03 -05:00
Thananon Patinyasakdikul
09cba8b30b pml/ob1: fixed out of sequence bug.
This commit fixes #4795

- Fixed typo that sometimes causes deadlock in change of protocol.
- Redesigned out of sequence ordering and address the overflow case of
  sequence number from uint16_t.

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
2018-02-27 13:49:40 -05:00
Nathan Hjelm
5380d7cce5 mpool/hugepage: add missing header
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-02-26 13:35:56 -07:00
Nathan Hjelm
38d9b10db8 rcache/base: update VMA tree to use opal_interval_tree_t
This commit replaces the current VMA tree implementation with one that
uses the new opal_interval_tree_t class. Since the VMA tree lock is no
longer used this commit also updates rcache/grdma and btl/vader to
take better care when searching for existing registrations.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-02-26 13:35:56 -07:00
Nathan Hjelm
7163fc98a0 opal/class: add a new class: opal_interval_tree_t
This commit adds a new class to opal: opal_interval_tree_t. This is a
thread-safe impelementation of a 1-dimensional interval tree. The data
structure is intended to provide a faster implementation of the
registration cache VMA tree.

The thread safety is provided by a relativistic red-black tree
implementation. This structure provides support for multiple-reader,
and single writer. There is one caveat, an item may appear in the tree
twice while the tree is being updated. Care needs to be taken to avoid
issues associated with this "feature". I don't anticipate a problem
with the current VMA tree usage.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-02-26 13:35:56 -07:00
Ralph Castain
5c59876451
Merge pull request #4864 from ggouaillardet/topic/pmix_configury
configury: look for PMI header in DIR provided by --with-pmi=DIR
2018-02-25 18:10:39 -08:00
Gilles Gouaillardet
83dd8cd3fc configury: look for PMI header in DIR provided by --with-pmi=DIR
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-02-26 09:46:26 +09:00
Gilles Gouaillardet
75acb172a8
Merge pull request #4853 from ggouaillardet/topic/configury_pmi
configury: fix PMI detection
2018-02-25 19:04:57 +09:00
Gilles Gouaillardet
b86e0f04bf configury: fix PMI detection
and do not end up with -L/usr/lib[64] when PMI libraries
are installed in the default location.

Thanks Davide Vanzo for the report.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-02-24 01:16:12 +09:00
valentin petrov
40e0ae7326
Merge pull request #4850 from vspetrov/master
coll/hcoll: Fix return codes
2018-02-22 21:26:47 +03:00
Valentin Petrov
bf4e694a96 coll/hcoll: Fix return codes
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2018-02-22 17:48:29 +02:00
Matias Cabral
0a822f8f99
Merge pull request #4821 from nrspruit/OFI_mtl_multi_event_progress
MTL OFI: Added support for reading multiple CQ events in ofi progress
2018-02-20 14:59:47 -08:00
Jeff Squyres
c0c70a82d8
Merge pull request #4842 from jsquyres/pr/add-so-versioning-to-ompi-common-monitoring-lib
ompi/monitoring: add .sh versionig to common monitoring lib
2018-02-20 11:11:50 -05:00
Jeff Squyres
9ef0f3d83a ompi/monitoring: add .sh versionig to common monitoring lib
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-02-20 07:07:23 -08:00
Gilles Gouaillardet
bc2ed21229
Merge pull request #4840 from ggouaillardet/topic/oversubscribe
orted_submit: fix the --oversubscribe option
2018-02-20 19:34:29 +09:00
Gilles Gouaillardet
02b97146de orted_submit: fix the --oversubscribe option
do set the ORTE_MAPPING_SUBSCRIBE_GIVEN directive when --oversubscribe is used

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-02-20 17:15:12 +09:00
Ralph Castain
7d0e02345b
Merge pull request #4836 from rhc54/topic/pmix
Sync to PMIx master
2018-02-19 10:24:52 -08:00
Ralph Castain
60e6440603 Sync to PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-19 09:20:13 -08:00
Jeff Squyres
271fc6a320
Merge pull request #4834 from jsquyres/pr/usnic-warning-fix
btl/usnic: missed a preprocessor check in d36648b
2018-02-19 10:47:43 -05:00
Jeff Squyres
b452991ad8 btl/usnic: missed a preprocessor check in d36648b
Missed updating one instance of `==` to `>=` in d36648b.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-02-19 07:03:05 -08:00
Yossi Itigin
1b1402299a
Merge pull request #4833 from alex-mikheev/topic/oshmem_gcache_grp_msg_fix
oshmem: increase group cache size to 1000
2018-02-19 14:39:26 +02:00
Alex Mikheev
03a094b9a8
oshmem: increase group cache size to 1000
and fix typos in help messages

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-02-19 11:50:24 +02:00
Yossi Itigin
9bf9125ac7
Merge pull request #4802 from alex-mikheev/topic/oshmem_a2as_fix
oshmem: scoll: fixes strided alltoall
2018-02-19 11:17:00 +02:00
Alex Mikheev
cca67a69ea oshmem: scoll: fixes strided alltoall
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-02-19 09:41:21 +02:00
Artem Polyakov
cd35c493da
Merge pull request #4826 from artpol84/modex/rm_thr_level
ompi: ompi_mpi_init(): do not export threading level to modex.
2018-02-17 11:29:45 -08:00
Artem Polyakov
b601dd504a ompi: ompi_mpi_init(): do not export threading level to modex.
For some of our configuration this flag increases per-process contribution
by ~20% while it is not being used currently.

The consumer of this flag was communicator ID calculation logic, but it was
changed in 0bf06de3f1.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2018-02-18 01:40:15 +07:00
Jeff Squyres
c24a303cb2
Merge pull request #4825 from jsquyres/pr/usnic-btl-version
btl/usnic: update BTL_VERSION handling
2018-02-17 08:27:07 -05:00
Jeff Squyres
d36648b547 btl/usnic: update BTL_VERSION handling
Follow-on to 8097d09858: now that BTL_VERSION is defined in btl.h, be
a little smarter about whether we define it or not.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-02-16 13:20:36 -08:00
Ralph Castain
6fb6aedd5a
Merge pull request #4824 from rhc54/topic/usnic
Silence usnic warnings - BTL version has changed
2018-02-16 10:47:25 -08:00
Ralph Castain
8097d09858 Silence usnic warnings - BTL version has changed
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-16 10:00:18 -08:00
Artem Polyakov
7333f128f6
Merge pull request #4815 from artpol84/slurm/plm_fix
plm/slurm:
2018-02-15 12:45:43 -08:00
Nathan Hjelm
9aa21f4467
Merge pull request #4796 from hjelmn/btl_v3.1
opal/btl: add support for flushing RDMA/atomic operations
2018-02-15 12:44:18 -07:00
Spruit, Neil R
e7bff501cd MTL OFI: Added support for reading multiple CQ events in ofi progress
-Updated ompi_mtl_ofi_progress to use an array to read CQ events up to a
threshold that can be set by the Open MPI User.

-Users can adjust the number of events that can be handled in the
ompi_mtl_ofi_progress by setting "--mca mtl_ofi_progress_event_cnt #".

-The default value for the the number of CQ events that can be read in a
single call to ofi progress is 100 which is an average
based off workload usecase anaylsis showing 70-128 as the range of
multiple events returned during ofi progress.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-02-15 09:41:14 -05:00
Gilles Gouaillardet
dd24c746dc output-filename: cleanup obsolete code.
Since output-filename has been moved to a per-job attribute,
remove the orte_output_filename global variable, and stop passing
this option to orted.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-02-15 10:40:44 +09:00
Artem Polyakov
ab8bb4b0a3 plm/slurm:
Sync command line output for Slurm with RSH launcher.
Currently Slurm launch cmdline will only be visible in debug mode, while for RSH
it is enabled always.
cmdline makes sense for troubleshooting and should be enabled.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2018-02-15 03:09:18 +07:00
Ralph Castain
cde68c9306
Merge pull request #4809 from rhc54/topic/outputfile
Ensure that output-filename is passed as an absolute path
2018-02-13 17:45:45 -08:00
Ralph Castain
af07b3df89 Update help and man pages for output-filename
Warn that relative path will be converted to absolute path, meaning that the file system on remote nodes must be the same as on the node where mpirun is executed.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-13 15:33:33 -08:00
Nathan Hjelm
072a6a4850 opal/btl: add support for flushing RDMA/atomic operations
This commit adds a new optional function to the BTL module:
btl_flush. This function takes an optional BTL endpoint. When called
this function completes all outstanding RDMA and atomic operations
started prior to the call to btl_flush.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-02-13 12:49:51 -07:00
Nathan Hjelm
6ef9d11cdb
Merge pull request #4812 from hjelmn/thread_perf
coll/libnbc: do not take lock in progress if there are no requests
2018-02-13 11:48:57 -07:00
Nathan Hjelm
0e83568466 coll/libnbc: do not take lock in progress if there are no requests
This commit fixes a flaw in the progress function for libnbc. The
function was unconditionally taking a lock even if there are no
requests to process. This lock was showing up in vtune traces of
multi-threaded benchmarks.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-02-13 09:51:01 -07:00
Edgar Gabriel
75c6db867b
Merge pull request #4808 from edgargabriel/pr/ompi_req_free_fix
io/ompio: correctly reset the request handle
2018-02-13 10:20:39 -06:00
Ralph Castain
02e19a1c4f Ignore generated file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-13 07:30:00 -08:00
Ralph Castain
f5c3239290 Ensure that output-filename is passed as an absolute path
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-13 07:28:42 -08:00