1
1

27076 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
c4f64c39d1 Merge pull request #3526 from ggouaillardet/topic/unpack_hetero
opal/datatype: do not compute ptypes for OPAL predefined datatypes
2017-05-17 14:35:49 +09:00
Gilles Gouaillardet
384387bb53 Merge pull request #3411 from ggouaillardet/topic/mpi_f08_interfaces_callbacks
f08: make procedure(MPI_User_function) type available from mpi_f08
2017-05-17 09:02:26 +09:00
Jeff Squyres
23325c31d3 Merge pull request #3338 from jjhursey/topic/ompi_info_show_failed
`ompi_info --show-failed` feature
2017-05-16 17:08:43 -04:00
Jeff Squyres
39fa1d5c05 Merge pull request #3500 from bosilca/topic/any_source
Allow MPI_ANY_SOURCE in MPI_Sendrecv_replace.
2017-05-16 16:36:00 -04:00
Jeff Squyres
3e7ce4c034 Merge pull request #3537 from rhc54/topic/news
Add 1.10.7 NEWS
2017-05-16 16:17:37 -04:00
Ralph Castain
efb0795ce2 Add 1.10.7 NEWS
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-16 08:48:51 -07:00
Gilles Gouaillardet
22ab73cb1a Merge pull request #3471 from ggouaillardet/topic/execve_cmd
odls: fix handling of the orte fork agent
2017-05-15 15:07:39 +09:00
Gilles Gouaillardet
5a35a8e82c opal/datatype: do not compute ptypes for OPAL predefined datatypes
Fixes open-mpi/ompi#3522

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-15 11:43:48 +09:00
Ralph Castain
e682b5d7d8 Merge pull request #3523 from rhc54/topic/cleanup
Remove debug
2017-05-12 13:45:55 -07:00
Ralph Castain
b527c40dae Remove debug
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-12 12:41:36 -07:00
Ralph Castain
23af6c9d02 Merge pull request #3519 from rhc54/topic/nolocal
Fix --nolocal
2017-05-12 09:57:52 -07:00
Ralph Castain
4e5e8be85e Merge pull request #3520 from rhc54/topic/slotsalloc
Fix total_slots_allocated computation
2017-05-12 09:38:00 -07:00
Ralph Castain
45bbd598c1 Fix --nolocal
Fix the --nolocal option by ensuring we always check/remove the HNP from the list of available nodes if the flag is set
Ensure that the HNP node is included as available when nothing else is given

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-12 09:03:26 -07:00
Ralph Castain
29e083bffd Fix total_slots_allocated computation
On unmanaged allocations, we need to update the total_slots_allocated once the daemons have been launched and "discovered" their topology

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-12 08:21:52 -07:00
Jeff Squyres
9f317f0a5c Merge pull request #1390 from jsquyres/pr/minor-monitoring-library-cleanups
monitoring lib: rename to ompi_monitoring_prof.so
2017-05-11 11:11:02 -04:00
Ralph Castain
2f507a1113 Merge pull request #3517 from rhc54/topic/cisco2
When a daemon force-terminates, we don't get the show_help message it…
2017-05-11 07:47:25 -07:00
Ralph Castain
9164afbb08 When a daemon force-terminates, we don't get the show_help message it was trying to send because the message is at a lower priority than the termination event. Resolve this by putting the oob in its own progress thread. Also, use only that one thread by default - if someone needs more progress threads in the OOB, they can use the MCA param to get them.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-11 06:52:55 -07:00
KAWASHIMA Takahiro
0650d4141f Merge pull request #3401 from kawashima-fj/pr/fortran-argv-null
fortran: Fix `MPI_ARGV(S)_NULL` compilation error
2017-05-11 11:23:12 +09:00
KAWASHIMA Takahiro
854fa5fc55 Merge pull request #3489 from kawashima-fj/pr/group-remote-peers-2nd
group: Fix `ompi_group_have_remote_peers` (2nd try)
2017-05-11 11:22:15 +09:00
Ralph Castain
987339ed76 Merge pull request #3515 from rhc54/topic/cisco2
Add some more debug output
2017-05-10 17:29:18 -07:00
Ralph Castain
f47124e4d3 Finally fix the problem - the key was knowing there were more than 2 topologies involved, and that the HNP is not allocated. Give up on being cute and just search the darned list of topologies - there won't be that many, and if there are (so the scan takes awhile), then too bad.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-10 16:44:19 -07:00
Ralph Castain
3b29b78a19 Merge pull request #3507 from rhc54/topic/cleanup
Sigh - remove debug
2017-05-10 13:36:27 -07:00
Matias A Cabral
644641d06f PSM and PSM2 MTLs check on the max message size allowed by API.
OMPI send and receive mesages use size_t for the lenght while PSM and PSM2
psm(2)mq_send/receive use uint32_t. Type size_t is 64 bits in 64 bits arch.
Therefore, this patch adds a sanity check on the lenght of the message
and fails gracefully.

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2017-05-10 12:45:11 -07:00
Ralph Castain
55f4b825af Add verbose output to nidmap code for debugging as this is a new, and sometimes fragile, feature
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-10 12:40:02 -07:00
Ralph Castain
911961ee21 Sigh - remove debug
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-10 11:26:42 -07:00
Ralph Castain
2d93d15aa7 Merge pull request #3502 from rhc54/topic/cisco
Fix nidmap computation to deal with hetero nodes
2017-05-10 11:21:12 -07:00
Ralph Castain
c42ce3eeea Merge pull request #3505 from rhc54/topic/rmlofi
Update the RML OFI by copying the updated files from @anandhis branch
2017-05-10 11:20:46 -07:00
Ralph Castain
50646b07ce Update the RML OFI by copying the updated files from @anandhis branch
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-10 09:17:06 -07:00
Jeff Squyres
c34ba88b22 monitoring lib: fix some Makefile.am macros
* Use the proper lib prefix name
* Use the proper extra LDFLAGS

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-05-10 09:03:59 -07:00
Jeff Squyres
626167f2a9 monitoring lib: rename to ompi_monitoring_prof.so
The library that is installed is specific to Open MPI, so put an
"ompi_" prefix on it.

Also do some minor line wrappings and cleanups of text.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-05-10 09:03:56 -07:00
Ralph Castain
442e307a6e Fix the nidmap computation to deal with hetero nodes
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-10 08:43:28 -07:00
Gilles Gouaillardet
026f3dd2dd pmix2x: plug a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-10 14:57:44 +09:00
Gilles Gouaillardet
3c6631ff6c opal: fix FIND_FIRST_ZERO macro for opal_pointer_array internal handling
Thanks George for the patch.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-10 14:57:44 +09:00
George Bosilca
86a7b317a5
Allow MPI_ANY_SOURCE in MPI_Sendrecv_replace.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-05-09 16:57:15 -04:00
bosilca
d7ebcca93f Add volatile to the pointer in the list_item structure. (#3468)
This change has the side effect of improving the performance of all
atomic data structures (in addition to making the code crrect under a
certain interpretation of the volatile usage).
This commit fixes #3450.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-05-09 10:12:20 -04:00
bosilca
cbf03b3113 Topic/datatype (#3441)
* Don't overflow the internal datatype count.
Change the type of the count to be a size_t (it does not alter the total
size of the internal structures, so has no impact on the ABI).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Optimize the datatype creation.
The internal array of counts of predefined types is now only created
when needed, which is either in a heterogeneous environment, or when
one call get_elements. It saves space and makes the convertor creation a
little faster in some cases.

Rearrange the fields in the datatype description structs.

The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the
static array was only partially created. All predefined types should
have the ptypes array created and initialized.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Fix the boundary computation.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* test/datatype: add test for short unpack on heteregeneous cluster

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Trying to reduce the cost of creating a convertor.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Respect the unpack boundaries.
As Gilles suggested on #2535 the opal_unpack_general_function was
unpacking based on the requested count and not on the amount of packed
data provided.
Fixes #2535.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-05-09 09:31:40 -04:00
Gilles Gouaillardet
a66909b8b4 Merge pull request #3488 from ggouaillardet/topic/romio314_ad_nfs
romio314: ad_nfs fixes for large files from upstream mpich
2017-05-09 16:58:02 +09:00
Gilles Gouaillardet
26f44da429 coll/base: fix mca_coll_base_alltoallv_intra_basic_inplace()
correctly handle the case when a MPI task has no data to send/recv

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 15:19:14 +09:00
Gilles Gouaillardet
eaf050cfe1 romio314: adio/ad_nfs: fix buffer overflows in ADIOI_NFS_{Read,Write}Strided
Refs: models/mpich#2338
Refs: models/mpich#2617

Signed-off-by: Rob Latham <robl@mcs.anl.gov>

(back-ported from upstream commit pmodels/mpich@642db57648)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 11:11:12 +09:00
Gilles Gouaillardet
02af10ce6e romio314: update NFS read/write routines for large xfers
When we updated UFS and others we left NFS alone.  HDF group would like
a fix, so here we go.

Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>

(back-ported from upstream commit pmodels/mpich@684df9f4c9)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 11:07:47 +09:00
Jeff Squyres
7185567d50 Merge pull request #3455 from jsquyres/pr/fix-lustre-configure
Lustre configure fixes
2017-05-08 16:49:23 -04:00
Ralph Castain
2f11d371cd Merge pull request #3448 from rhc54/topic/omp
Implement the changes required to support cross-library coordination.…
2017-05-08 11:08:36 -07:00
Ralph Castain
0afcb1a448 Update to support server self-notifications
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-08 10:04:50 -07:00
Ralph Castain
ef0e0171c9 Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c).
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-08 10:04:50 -07:00
Ralph Castain
42d31454a5 Merge pull request #3469 from rhc54/topic/nidmap
Do not pass topologies during tree spawn of daemons as there is no wa…
2017-05-08 06:22:50 -07:00
KAWASHIMA Takahiro
e453e42279 group: Fix ompi_group_have_remote_peers
`ompi_group_t::grp_proc_pointers[i]` may have sentinel values even
for processes which reside in the local node because the array for
`MPI_COMM_WORLD` is set up before `ompi_proc_complete_init`, which
allocates `ompi_proc_t` objects for processes reside in the local
node, is called in `MPI_INIT`. So using `ompi_proc_is_sentinel`
against `ompi_group_t::grp_proc_pointers[i]` in order to determine
whether the process resides in a remote node is not appropriate.

This bug sometimes causes an `MPI_ERR_RMA_SHARED` error when
`MPI_WIN_ALLOCATE_SHARED` is called, where sm OSC uses
`ompi_group_have_remote_peers`.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-05-08 20:28:51 +09:00
KAWASHIMA Takahiro
9841ad3035 Merge pull request #3472 from open-mpi/revert-3410-pr/group-remote-peers
Revert "group: Fix `ompi_group_have_remote_peers`"
2017-05-08 18:47:30 +09:00
KAWASHIMA Takahiro
913adce59b Revert "group: Fix ompi_group_have_remote_peers" 2017-05-08 18:42:18 +09:00
Gilles Gouaillardet
e101f2b3f9 orte/util: fix vpids parsing in orte_util_nidmap_parse()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-08 16:46:13 +09:00
Gilles Gouaillardet
16fc0996e6 odls: fix handling of the orte fork agent
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-08 16:07:13 +09:00