1
1
Граф коммитов

28308 Коммитов

Автор SHA1 Сообщение Дата
Yossi Itigin
9bf9125ac7
Merge pull request #4802 from alex-mikheev/topic/oshmem_a2as_fix
oshmem: scoll: fixes strided alltoall
2018-02-19 11:17:00 +02:00
Alex Mikheev
cca67a69ea oshmem: scoll: fixes strided alltoall
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-02-19 09:41:21 +02:00
Artem Polyakov
cd35c493da
Merge pull request #4826 from artpol84/modex/rm_thr_level
ompi: ompi_mpi_init(): do not export threading level to modex.
2018-02-17 11:29:45 -08:00
Artem Polyakov
b601dd504a ompi: ompi_mpi_init(): do not export threading level to modex.
For some of our configuration this flag increases per-process contribution
by ~20% while it is not being used currently.

The consumer of this flag was communicator ID calculation logic, but it was
changed in 0bf06de3f1.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2018-02-18 01:40:15 +07:00
Jeff Squyres
c24a303cb2
Merge pull request #4825 from jsquyres/pr/usnic-btl-version
btl/usnic: update BTL_VERSION handling
2018-02-17 08:27:07 -05:00
Jeff Squyres
d36648b547 btl/usnic: update BTL_VERSION handling
Follow-on to 8097d09858: now that BTL_VERSION is defined in btl.h, be
a little smarter about whether we define it or not.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-02-16 13:20:36 -08:00
Ralph Castain
6fb6aedd5a
Merge pull request #4824 from rhc54/topic/usnic
Silence usnic warnings - BTL version has changed
2018-02-16 10:47:25 -08:00
Ralph Castain
8097d09858 Silence usnic warnings - BTL version has changed
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-16 10:00:18 -08:00
Artem Polyakov
7333f128f6
Merge pull request #4815 from artpol84/slurm/plm_fix
plm/slurm:
2018-02-15 12:45:43 -08:00
Nathan Hjelm
9aa21f4467
Merge pull request #4796 from hjelmn/btl_v3.1
opal/btl: add support for flushing RDMA/atomic operations
2018-02-15 12:44:18 -07:00
Spruit, Neil R
e7bff501cd MTL OFI: Added support for reading multiple CQ events in ofi progress
-Updated ompi_mtl_ofi_progress to use an array to read CQ events up to a
threshold that can be set by the Open MPI User.

-Users can adjust the number of events that can be handled in the
ompi_mtl_ofi_progress by setting "--mca mtl_ofi_progress_event_cnt #".

-The default value for the the number of CQ events that can be read in a
single call to ofi progress is 100 which is an average
based off workload usecase anaylsis showing 70-128 as the range of
multiple events returned during ofi progress.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-02-15 09:41:14 -05:00
Gilles Gouaillardet
dd24c746dc output-filename: cleanup obsolete code.
Since output-filename has been moved to a per-job attribute,
remove the orte_output_filename global variable, and stop passing
this option to orted.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-02-15 10:40:44 +09:00
Artem Polyakov
ab8bb4b0a3 plm/slurm:
Sync command line output for Slurm with RSH launcher.
Currently Slurm launch cmdline will only be visible in debug mode, while for RSH
it is enabled always.
cmdline makes sense for troubleshooting and should be enabled.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2018-02-15 03:09:18 +07:00
Ralph Castain
cde68c9306
Merge pull request #4809 from rhc54/topic/outputfile
Ensure that output-filename is passed as an absolute path
2018-02-13 17:45:45 -08:00
Ralph Castain
af07b3df89 Update help and man pages for output-filename
Warn that relative path will be converted to absolute path, meaning that the file system on remote nodes must be the same as on the node where mpirun is executed.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-13 15:33:33 -08:00
Nathan Hjelm
072a6a4850 opal/btl: add support for flushing RDMA/atomic operations
This commit adds a new optional function to the BTL module:
btl_flush. This function takes an optional BTL endpoint. When called
this function completes all outstanding RDMA and atomic operations
started prior to the call to btl_flush.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-02-13 12:49:51 -07:00
Nathan Hjelm
6ef9d11cdb
Merge pull request #4812 from hjelmn/thread_perf
coll/libnbc: do not take lock in progress if there are no requests
2018-02-13 11:48:57 -07:00
Nathan Hjelm
0e83568466 coll/libnbc: do not take lock in progress if there are no requests
This commit fixes a flaw in the progress function for libnbc. The
function was unconditionally taking a lock even if there are no
requests to process. This lock was showing up in vtune traces of
multi-threaded benchmarks.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-02-13 09:51:01 -07:00
Edgar Gabriel
75c6db867b
Merge pull request #4808 from edgargabriel/pr/ompi_req_free_fix
io/ompio: correctly reset the request handle
2018-02-13 10:20:39 -06:00
Ralph Castain
02e19a1c4f Ignore generated file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-13 07:30:00 -08:00
Ralph Castain
f5c3239290 Ensure that output-filename is passed as an absolute path
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-13 07:28:42 -08:00
Edgar Gabriel
a3a734b6d2 io/ompio: correctly reset the request
after performing the final OBJ_RELEASE on the request,
reset the user level variable to MPI_REQUEST_NULL.
Otherwise the c_2_f translation step in the fortran
interface fails.

Fixes issue #4807

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-02-13 09:18:25 -06:00
Jeff Squyres
e7f91f8068
Merge pull request #4527 from clementFoyer/osc-no-includes
Remove inter-dependencies between OSC modules.
2018-02-09 15:49:56 -05:00
Gilles Gouaillardet
9121eb4ff9 opal/lifo: fix a ABA problem in opal_lifo_pop_atomic
that was introduced in open-mpi/ompi@11bb8b09a0

Fixes open-mpi/ompi#4784

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-02-09 14:48:54 +09:00
Ralph Castain
efd715ed85
Merge pull request #4760 from rhc54/topic/map
Correct mapping errors
2018-02-08 10:46:10 -08:00
Nathan Hjelm
168e74d0cf
Merge pull request #4798 from hjelmn/ob1_fix
pml/ob1: ignore the eager limit of RDMA-only btls
2018-02-08 09:21:24 -07:00
Ralph Castain
1a7dfd7d54 Sync to PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-07 12:16:51 -08:00
Nathan Hjelm
da9f833f4a pml/ob1: ignore the eager limit of RDMA-only btls
This commit fixes a flaw in the eager limit check in pml/ob1. The
check was incorrectly checking if RDMA-only BTLs (BTLs without the
send flag) has a valid eager limit. This commit fixes the check by
adding an additional check for the send flag on the BTL module.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-02-07 12:42:44 -07:00
Ralph Castain
cb221b6f6f Correct mapping errors
Since we now support the dynamic addition of hosts to the orte_node_pool, there is no longer any reason to require advanced specification of all possible nodes. Instead, use a precedence method to initially allocate only those hosts that were specified in the cmd line:

* rankfile, if given, as that will specify the nodes

* -host, aggregated across all app_contexts

* -hostfile, aggregated across all app_contexts

* default hostfile

* assign local node

Fix slots_inuse accounting so that the nodes are correctly reset upon error termination - e.g., when oversubscribed without permission.

Ensure we accurately track the user's specified desires for oversubscribe and no-use-local when dynamically spawning jobs.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit c9b3e68ce596a68a2ed2fbf73f211b3334b0a6a8)
2018-02-07 11:29:21 -08:00
Clement Foyer
f5b4fc05f8 Remove inter-dependencies between OSC modules.
The osc monitoring component needed to include other OSC components
header in order to be able tu access communicator through the
component specific ompi_osc_*_module_t structures. This commit remove
the dependency, and resolve the issue #4523.

Extend the common monitoring API.

  * Now it's possible to translate from local rank to world rank from
    both the communicator and the group.
  * Remove useless hashtable as we directly use the w_group contained
    in window structure.

Add automatic generation at config time.

The templates are expanded at configure time. It creates a new header
file that generates all the variables/functions needed. Adding this
during the autogen automagicaly generates for each of the available
modules the proper functions.

Only keep a generated argv-style array.

Following Jeff's advice, the configure.m4 file generate a simple array
of module variables to be iterated over to find the proper module.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
2018-02-07 11:52:00 +00:00
Ralph Castain
4f13dbc15e
Merge pull request #4792 from rhc54/topic/badexe
Ensure we fail if remote nodes cannot find executable
2018-02-05 20:31:46 -08:00
Ralph Castain
ce901ba247 Ensure we fail if remote nodes cannot find executable
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-05 19:31:43 -08:00
Ralph Castain
71980fe8e5
Merge pull request #4786 from rhc54/topic/dmdx
ORTE-side fix of request for job info from unknown nspace
2018-02-04 11:14:07 -08:00
Ralph Castain
10be1df1d3 Remove debug and add target/probe programs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 9a03007115fc8978f4eb5fd938c05b26adbd433e)
2018-02-03 20:06:18 -08:00
Ralph Castain
9fe8153d38 Sync to IOF branch and continue fix of request for job info from unknown nspace
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 02400d30d79ce3c7e7e28f9a08f7062a5b6f4c51)
2018-02-03 19:56:35 -08:00
Ralph Castain
73a9a4f8c7
Merge pull request #4785 from rhc54/topic/cleanup
Silence warnings
2018-02-03 05:36:14 -08:00
Ralph Castain
73ef976ead Silence warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-03 00:29:06 -08:00
Howard Pritchard
1adf0873f7
Merge pull request #4766 from hppritcha/topic/squash_grdma_comp_warning
rcache/grdma: squash a compiler warning
2018-02-02 18:58:32 -07:00
Artem Polyakov
4add7cd5f5
Merge pull request #4781 from karasevb/fix_rmaps_nodelist
rmaps: fixed the ordering of `mpirun` target nodes
2018-02-02 12:29:15 -08:00
Boris Karasev
52e81ee4b1 rmaps: fixed the ordering of mpirun target nodes
Fixed the desync of job-nodelists between mpirun and orted
daemons. The issue was observed when using RSH launching because user
can provide arbitrary order of nodes regarding HNP placement.
The mpirun process propagate the daemon's nodelist order to nodes.
The problem was that HNP itself is assembling the nodelist based on
user provided order. As the result ranks assignment was calculated
differently on orted and mpirun.

Consider following example:
* User launches mpirun on node cn2.
* Hostlist is cn1,cn2,cn3,cn4; ppn=1
* mpirun is passing hostlist cn[2:2,1,3-4]@0(4) to orteds
So as result mpirun will assing rank 0 on cn1 while orted will assign
rank 0 on cn2 (because orted sees cn2 as the first element in the node
list)

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-02-01 17:16:05 +02:00
Ralph Castain
bc1d7ff2cc
Merge pull request #4780 from ggouaillardet/topic/ext3x
pmix/ext3x: remove autogenerated ext3x.h header file
2018-01-31 08:18:34 -08:00
Gilles Gouaillardet
43700faba1 pmix/ext3x: remove autogenerated ext3x.h header file
This header file was meant to be autogenerated, and for
some reasons, was never removed from the repository.
Update .gitignore as well

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 23:45:42 +09:00
Ralph Castain
7ddffc627d
Merge pull request #4776 from rhc54/topic/rte
Correct abstraction break and update ignores
2018-01-31 04:34:05 -08:00
Gilles Gouaillardet
9dcb7ab317
Merge pull request #4772 from ggouaillardet/topic/osc_sm_free
osc/sm: fix the osc_free callback
2018-01-31 17:06:36 +09:00
Ralph Castain
982415749c Update ignores for pmix/ext3x component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-30 21:41:55 -08:00
Ralph Castain
8e8a9aecc5 Correct abstraction break - direct reference to ORTE
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-30 21:19:14 -08:00
Ralph Castain
0c5bb999ed
Merge pull request #4775 from ggouaillardet/topic/ext3x
pmix/ext3x: bring external component up-to-date with the embedded pmix3x
2018-01-30 21:18:48 -08:00
Gilles Gouaillardet
8209fca842 pmix/ext3x: bring external component up-to-date with the embedded pmix3x
add the callback prototype for the upcoming PMIx_IOF_push() API

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:35:34 +09:00
Gilles Gouaillardet
0481277e93 pmix/ext3x: bring external component up-to-date with the embedded pmix3x
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:33:33 +09:00
Nathan Hjelm
bb212e0c94
Merge pull request #4767 from ggouaillardet/topic/vader_backing_file
btl/vader: make the backing file job specific
2018-01-30 21:27:02 -07:00