1
1
Граф коммитов

28236 Коммитов

Автор SHA1 Сообщение Дата
Edgar Gabriel
a3a734b6d2 io/ompio: correctly reset the request
after performing the final OBJ_RELEASE on the request,
reset the user level variable to MPI_REQUEST_NULL.
Otherwise the c_2_f translation step in the fortran
interface fails.

Fixes issue #4807

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-02-13 09:18:25 -06:00
Jeff Squyres
e7f91f8068
Merge pull request #4527 from clementFoyer/osc-no-includes
Remove inter-dependencies between OSC modules.
2018-02-09 15:49:56 -05:00
Gilles Gouaillardet
9121eb4ff9 opal/lifo: fix a ABA problem in opal_lifo_pop_atomic
that was introduced in open-mpi/ompi@11bb8b09a0

Fixes open-mpi/ompi#4784

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-02-09 14:48:54 +09:00
Ralph Castain
efd715ed85
Merge pull request #4760 from rhc54/topic/map
Correct mapping errors
2018-02-08 10:46:10 -08:00
Nathan Hjelm
168e74d0cf
Merge pull request #4798 from hjelmn/ob1_fix
pml/ob1: ignore the eager limit of RDMA-only btls
2018-02-08 09:21:24 -07:00
Ralph Castain
1a7dfd7d54 Sync to PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-07 12:16:51 -08:00
Nathan Hjelm
da9f833f4a pml/ob1: ignore the eager limit of RDMA-only btls
This commit fixes a flaw in the eager limit check in pml/ob1. The
check was incorrectly checking if RDMA-only BTLs (BTLs without the
send flag) has a valid eager limit. This commit fixes the check by
adding an additional check for the send flag on the BTL module.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-02-07 12:42:44 -07:00
Ralph Castain
cb221b6f6f Correct mapping errors
Since we now support the dynamic addition of hosts to the orte_node_pool, there is no longer any reason to require advanced specification of all possible nodes. Instead, use a precedence method to initially allocate only those hosts that were specified in the cmd line:

* rankfile, if given, as that will specify the nodes

* -host, aggregated across all app_contexts

* -hostfile, aggregated across all app_contexts

* default hostfile

* assign local node

Fix slots_inuse accounting so that the nodes are correctly reset upon error termination - e.g., when oversubscribed without permission.

Ensure we accurately track the user's specified desires for oversubscribe and no-use-local when dynamically spawning jobs.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit c9b3e68ce596a68a2ed2fbf73f211b3334b0a6a8)
2018-02-07 11:29:21 -08:00
Clement Foyer
f5b4fc05f8 Remove inter-dependencies between OSC modules.
The osc monitoring component needed to include other OSC components
header in order to be able tu access communicator through the
component specific ompi_osc_*_module_t structures. This commit remove
the dependency, and resolve the issue #4523.

Extend the common monitoring API.

  * Now it's possible to translate from local rank to world rank from
    both the communicator and the group.
  * Remove useless hashtable as we directly use the w_group contained
    in window structure.

Add automatic generation at config time.

The templates are expanded at configure time. It creates a new header
file that generates all the variables/functions needed. Adding this
during the autogen automagicaly generates for each of the available
modules the proper functions.

Only keep a generated argv-style array.

Following Jeff's advice, the configure.m4 file generate a simple array
of module variables to be iterated over to find the proper module.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
2018-02-07 11:52:00 +00:00
Ralph Castain
4f13dbc15e
Merge pull request #4792 from rhc54/topic/badexe
Ensure we fail if remote nodes cannot find executable
2018-02-05 20:31:46 -08:00
Ralph Castain
ce901ba247 Ensure we fail if remote nodes cannot find executable
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-05 19:31:43 -08:00
Ralph Castain
71980fe8e5
Merge pull request #4786 from rhc54/topic/dmdx
ORTE-side fix of request for job info from unknown nspace
2018-02-04 11:14:07 -08:00
Ralph Castain
10be1df1d3 Remove debug and add target/probe programs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 9a03007115fc8978f4eb5fd938c05b26adbd433e)
2018-02-03 20:06:18 -08:00
Ralph Castain
9fe8153d38 Sync to IOF branch and continue fix of request for job info from unknown nspace
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 02400d30d79ce3c7e7e28f9a08f7062a5b6f4c51)
2018-02-03 19:56:35 -08:00
Ralph Castain
73a9a4f8c7
Merge pull request #4785 from rhc54/topic/cleanup
Silence warnings
2018-02-03 05:36:14 -08:00
Ralph Castain
73ef976ead Silence warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-03 00:29:06 -08:00
Howard Pritchard
1adf0873f7
Merge pull request #4766 from hppritcha/topic/squash_grdma_comp_warning
rcache/grdma: squash a compiler warning
2018-02-02 18:58:32 -07:00
Artem Polyakov
4add7cd5f5
Merge pull request #4781 from karasevb/fix_rmaps_nodelist
rmaps: fixed the ordering of `mpirun` target nodes
2018-02-02 12:29:15 -08:00
Boris Karasev
52e81ee4b1 rmaps: fixed the ordering of mpirun target nodes
Fixed the desync of job-nodelists between mpirun and orted
daemons. The issue was observed when using RSH launching because user
can provide arbitrary order of nodes regarding HNP placement.
The mpirun process propagate the daemon's nodelist order to nodes.
The problem was that HNP itself is assembling the nodelist based on
user provided order. As the result ranks assignment was calculated
differently on orted and mpirun.

Consider following example:
* User launches mpirun on node cn2.
* Hostlist is cn1,cn2,cn3,cn4; ppn=1
* mpirun is passing hostlist cn[2:2,1,3-4]@0(4) to orteds
So as result mpirun will assing rank 0 on cn1 while orted will assign
rank 0 on cn2 (because orted sees cn2 as the first element in the node
list)

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-02-01 17:16:05 +02:00
Ralph Castain
bc1d7ff2cc
Merge pull request #4780 from ggouaillardet/topic/ext3x
pmix/ext3x: remove autogenerated ext3x.h header file
2018-01-31 08:18:34 -08:00
Gilles Gouaillardet
43700faba1 pmix/ext3x: remove autogenerated ext3x.h header file
This header file was meant to be autogenerated, and for
some reasons, was never removed from the repository.
Update .gitignore as well

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 23:45:42 +09:00
Ralph Castain
7ddffc627d
Merge pull request #4776 from rhc54/topic/rte
Correct abstraction break and update ignores
2018-01-31 04:34:05 -08:00
Gilles Gouaillardet
9dcb7ab317
Merge pull request #4772 from ggouaillardet/topic/osc_sm_free
osc/sm: fix the osc_free callback
2018-01-31 17:06:36 +09:00
Ralph Castain
982415749c Update ignores for pmix/ext3x component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-30 21:41:55 -08:00
Ralph Castain
8e8a9aecc5 Correct abstraction break - direct reference to ORTE
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-30 21:19:14 -08:00
Ralph Castain
0c5bb999ed
Merge pull request #4775 from ggouaillardet/topic/ext3x
pmix/ext3x: bring external component up-to-date with the embedded pmix3x
2018-01-30 21:18:48 -08:00
Gilles Gouaillardet
8209fca842 pmix/ext3x: bring external component up-to-date with the embedded pmix3x
add the callback prototype for the upcoming PMIx_IOF_push() API

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:35:34 +09:00
Gilles Gouaillardet
0481277e93 pmix/ext3x: bring external component up-to-date with the embedded pmix3x
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:33:33 +09:00
Nathan Hjelm
bb212e0c94
Merge pull request #4767 from ggouaillardet/topic/vader_backing_file
btl/vader: make the backing file job specific
2018-01-30 21:27:02 -07:00
Gilles Gouaillardet
0285c63348 pmix/ext3x: generate component source when only static libraries are built
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:21:14 +09:00
Gilles Gouaillardet
34b45cc879 osc/sm: fix the osc_free callback
If component selection fails, then module->bases might be unallocated
when ompi_osc_sm_free() in invoked, so test it before trying to free()
module->bases[0].

Thanks Martin Binder for the report.

Refs open-mpi/ompi#4770

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 11:23:21 +09:00
Yossi Itigin
46967cfa63
Merge pull request #4764 from hoopoepg/topic/set-recv-status-canceled
request/state: update state for canceled request
2018-01-30 12:22:59 +02:00
Gilles Gouaillardet
611d7c2d27 btl/vader: make the backing file job specific
Since open-mpi/ompi@47fd2313ab
the backing file is now in /dev/shm by default. As a consequence,
the backing file name has to include the jobid so more than one job
can run at a time.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-30 16:52:51 +09:00
Howard Pritchard
c3cac6731f rcache/grdma: squash a compiler warning
tired of seeing this compiler warning

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-01-29 11:50:01 -07:00
Sergey Oblomov
7a5811d0a8 request/state: update state for canceled request
- fixed issue in set state for canceled request

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-01-29 18:26:20 +02:00
Ralph Castain
f3fbc1172d
Merge pull request #4757 from ggouaillardet/topic/iof_hnp
iof: do not release a sink before all read data is written.
2018-01-26 14:43:01 -08:00
Ralph Castain
e284a3e98b
Merge branch 'master' into topic/iof_hnp 2018-01-26 13:55:49 -08:00
Ralph Castain
5190f4e4ec
Merge pull request #4759 from rhc54/topic/missing
Properly terminate the job when executable not found
2018-01-26 13:55:32 -08:00
Ralph Castain
b643852d8a Properly terminate the job when executable not found
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-26 12:09:24 -08:00
Ralph Castain
c166e26265
Merge branch 'master' into topic/iof_hnp 2018-01-26 06:15:58 -08:00
Ralph Castain
d83d2be9ea
Merge pull request #4758 from rhc54/topic/sync
Refresh ORTE PMIx support
2018-01-25 15:00:37 -08:00
Ralph Castain
a17df810ed Sync with PMIx iof rfc
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 10:51:38 -08:00
Ralph Castain
e9cd7fd7e6 Update orte
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:53:43 -08:00
Ralph Castain
d1071397ac Update the orte/ess framework
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:43:44 -08:00
Ralph Castain
9fb80bd239 Update the opal/pmix base framework elements
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:37:52 -08:00
Ralph Castain
187352eb3d Update the PMIx external components
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:35:57 -08:00
Ralph Castain
a5679ef000 Update the PMIx 3.x component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:34:44 -08:00
Gilles Gouaillardet
54fb8ac5d5 iof: do not release a sink before all read data is written.
When too much data is available on stdin, it might not be
forwarded immediatly to the task (write() might fail with -EAGAIN),
so when stdin is terminated, there might be some remaining data
to be pushed to the task. In this case, delay the release of the sink
so no data is discarded.

Refs open-mpi/ompi#4744

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-25 16:29:22 +09:00
Gilles Gouaillardet
ebffaded5d iof/base: remove the unused iof_base_input_files MCA parameter
this option was only used by the iof/mr_hnp (aka Map/Reduce)
component that is no more part of master nor v3 branches.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-25 11:29:14 +09:00
Edgar Gabriel
57f946a798
Merge pull request #4749 from edgargabriel/pr/fs_ufs_bad_return_statement
fs/ufs and fs/lustre: remove erroneous return statement
2018-01-24 16:28:39 -06:00