1
1

23609 Коммитов

Автор SHA1 Сообщение Дата
Piotr Lesnicki
1dd5487fae odls: close only used file descriptors at fork/exec 2015-09-18 16:44:57 +02:00
Igor Ivanov
fb5d934e2f oshmem/proc: Refactor oshmem_proc to meet new add_proc changes
ompi has new mpi_add_procs_cutoff argument that can control
creation of ompi_proc_t but We should be confident that all
ompi_proc_t object exists during oshmem_group_all creation.
Probably it could be done in more flexible way later.

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-18 17:40:21 +03:00
Edgar Gabriel
01fcfb08fe do not set the contigous flag in two_phase_file_read_all. This optimization
needs some more debugging for the two_phase component, and is disabled
for two_phase_file_write_all as well.
2015-09-18 09:30:50 -05:00
Edgar Gabriel
3734a38370 this file should have been part of the previous commit. for removeing io_ompio_nbc.[ch] 2015-09-18 09:28:25 -05:00
Edgar Gabriel
cf46a6bd4d remove the io_ompio_nbc.[ch] files, they are not used anymore at this point in time. 2015-09-18 09:26:25 -05:00
Gilles Gouaillardet
a611274704 pml: fix commit open-mpi/ompi@6e6a3e965c
do not use the const modifier for allocator nor recv buffers
2015-09-18 09:54:18 +09:00
Rolf vandeVaart
7da614c75e Add ability for user to empty the CUDA IPC registration cache when it is full 2015-09-17 16:42:16 -04:00
Jeff Squyres
567c9e3a5b mtl_ofi_component.c: add missing argv.h header 2015-09-17 10:05:05 -07:00
Igor Ivanov
f437f4012e Revert "scoll/mpi: work around bug in oshmem/proc design"
This workaround is needless after oshmem/proc refactoring

This reverts commit 202c6a38e4151c4a4eab9884abaf21f621f87f6d.
2015-09-17 19:01:24 +03:00
Igor Ivanov
4b8d9b8eff oshmem/proc: Refactor proc component
Most functionality of oshmem_proc duplicates ompi_proc. In addition
to that, Current logic does not allow to do oshmem initialization
w/o ompi startup.
So this refactoring allows to  avoid code duplication, decrease used
memory and make oshmem support easier.
Now oshmem_proc is transparent ompi_proc structure, that can be
extended by oshmem specific data.

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-17 18:49:00 +03:00
Igor Ivanov
11f61790ee ompi/proc: Extend ompi_proc_t structure with padding to support oshmem data
Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-17 18:48:59 +03:00
Nathan Hjelm
dfbe584c92 ompi/group: fix typos in add_procs changes
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-17 09:21:32 -06:00
rhc54
6efe91a24b Merge pull request #904 from ggouaillardet/topic/cpuset
hwloc: do not count not allowed cores in df_search_cores
2015-09-17 03:55:50 -07:00
Gilles Gouaillardet
975b6fd51b hwloc: do not count not allowed cores in df_search_cores 2015-09-17 13:10:34 +09:00
Nathan Hjelm
d8df9d414d osc/rdma: add true RDMA one-sided component
This commit adds support for performing one-sided operations over
supported hardware (currently Infiniband and Cray Gemini/Aries). This
component is still undergoing active development.

Current features:

 - Use network atomic operations (fadd, cswap) for implementing
   locking and PSCW synchronization.

 - Aggregate small contiguous puts.

 - Reduced memory footprint by storing window data (pointer, keys,
   etc) at the lowest rank on each node. The data is fetched as each
   process needs to communicate with a new peer. This is a trade-off
   between the performance of the first operation on a peer and the
   memory utilization of a window.

TODO:

 - Add support for the accumulate_ops info key. If it is known that
   the same op or same op/no op is used it may be possible to use
   hardware atomics for fetch-and-op and compare-and-swap.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 15:01:33 -06:00
Nathan Hjelm
fd42343ff0 osc/pt2pt: reduce memory footprint of window
This commit updates osc/pt2pt to allocate peer object as they are
needed rather than all at once. Additionally, to help improve the
memory footprint a new synchronization structure has been added.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 13:01:56 -06:00
Nathan Hjelm
131681acc6 Merge pull request #901 from hjelmn/comm_fix
ompi/comm: fix comm_[i]dup on intracommunicators
2015-09-16 12:43:19 -06:00
Nathan Hjelm
c84c05bab7 ompi/comm: fix comm_[i]dup on intracommunicators
The behavior of ompi_comm_set was changed to get the remote size from
the remote group. This broke how ompi_comm_[i]dup were using
ompi_comm_set. In order to adapt to the new behavior these functions
now pass NULL for the remote group if the communicator is not an
inter-communicator.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 10:31:18 -06:00
rhc54
55d40910ee Merge pull request #899 from rhc54/topic/cov
Silence some warnings and address Coverity issues
2015-09-16 09:23:32 -07:00
Ralph Castain
1b7930ad52 Silence some warnings and address Coverity issues 2015-09-16 07:58:22 -07:00
Ralph Castain
8b88ea9b13 Fix singletons by removing stale code 2015-09-16 00:58:05 -07:00
George Bosilca
02624bd0b6 Fix all treematch issues idenfied by Coverity. 2015-09-15 23:49:11 -04:00
George Bosilca
6ab5f68fc3 indentation. 2015-09-15 22:46:13 -04:00
rhc54
5597416fe0 Merge pull request #897 from rhc54/topic/oob
Remove the last involvement of the OOB system from the MPI layer
2015-09-15 14:40:21 -07:00
Jeff Squyres
7cb546a221 core: yow; this should absolutely not be in the repo! 2015-09-15 16:15:04 -04:00
Ralph Castain
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
Rolf vandeVaart
555f14a479 Merge pull request #893 from rolfv/pr/more-verbose-fix
Cleanup handle verbose messages
2015-09-15 15:45:52 -04:00
rhc54
3b4e982f86 Merge pull request #896 from hjelmn/comm_set_fix
ompi/comm: fix bug in ompi_comm_set
2015-09-15 12:25:55 -07:00
Nathan Hjelm
9c45c63143 ompi/dpm: fix typo in dynamic communicator detection
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-15 12:42:58 -06:00
Nathan Hjelm
6379178046 ompi/comm: fix bug in ompi_comm_set
This commit updates the behavior of ompi_comm_set to explicitly take
either local/remote group(s) OR local/remote array(s). If array(s) are
in use the sizes will be taken from the appropriate group(s).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-15 11:37:44 -06:00
Nathan Hjelm
ca4be77ff1 Merge pull request #894 from hjelmn/osh_memheap_fix
oshmem/memheap: correct usage of opal_dss functions
2015-09-15 08:05:39 -06:00
George Bosilca
0e7e14449f Typo in the modex_recv. 2015-09-14 18:00:02 -04:00
Nathan Hjelm
69b9bc2269 oshmem/memheap: correct usage of opal_dss functions
Any buffer given to opal_dss.load becomes the responsibility of the
opal_buffer_t object. It will be freed automatically if either the
opal_buffer_t is released or opal_dss.load is called again on the
buffer. opal_dss.unload will not prevent this unless no unpacking
takes place between the .load and .unload calls.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-14 13:54:56 -06:00
Rolf vandeVaart
34fe2188cd Cleanup handle verbose messages 2015-09-14 11:01:25 -04:00
Mike Dubman
6f82ce3fc8 Merge pull request #879 from igor-ivanov/pr/disable-oshmem-issue
Prevent oshmem related files inside install folder in case --disable-oshmem
2015-09-14 12:12:06 +03:00
Gilles Gouaillardet
d5af5d106c btl/sm: mca_btl_sm_sendi: do not set *descriptor when descriptor is NULL 2015-09-14 14:04:40 +09:00
Nathan Hjelm
f29b65aa14 ompi/proc: fix typos CID 1323840
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 21:02:30 -06:00
rhc54
33f5e4c766 Merge pull request #892 from rhc54/topic/pmix
Fix the no-disconnect test by resolving a segfault on free - opal_dss…
2015-09-11 16:01:42 -07:00
Ralph Castain
fbcf819d2e Remove unnecessary include 2015-09-11 15:53:00 -07:00
Nathan Hjelm
f798c909d1 Merge pull request #883 from hjelmn/comm_split_update
ompi/comm: improve comm_split_type scalability
2015-09-11 16:35:34 -06:00
Rolf vandeVaart
d78b954fd4 Merge pull request #891 from rolfv/pr/minor-cuda-verbosity-fixes
Fix cuda verbosity messages
2015-09-11 16:33:22 -04:00
Ralph Castain
22d7c0081a Fix the no-disconnect test by resolving a segfault on free - opal_dss.unload will return the remaining unpacked portion of a buffer. As such, it cannot return the pointer to that info as it might be partway inside of a malloc'd region. So copy the data out of the buffer. 2015-09-11 13:01:35 -07:00
Ralph Castain
b60b03d613 It is okay not to get the hostname - we don't require that it be provided 2015-09-11 13:01:20 -07:00
Nathan Hjelm
c45789a222 ompi/comm: improve comm_split_type scalability
This commit includes two changes. First, the locality code has been
factored out to improve readability and maintainability. Second,
instead of looking up each proc using ompi_group_peer_lookup the code
now uses ompi_group_peer_lookup_existing. The code falls back on modex
if a proc doesn't exist. This will prevent MPI_Comm_split_type from
allocating ompi_proc_t's for every process in the job.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-11 13:53:48 -06:00
Rolf vandeVaart
90dd1d264b Fix cuda verbosity messages 2015-09-11 15:44:36 -04:00
Nathan Hjelm
1868b5937c Merge pull request #889 from hjelmn/sentinel_update
Use the low instead of the high bit to indicate a proc is a sentinel
2015-09-11 12:30:27 -06:00
rhc54
c31093ff19 Merge pull request #890 from rhc54/topic/fixpmi
Revert "Revert "Fix the handling of cpusets so we get the correct cpu…
2015-09-11 09:25:24 -07:00
Nathan Hjelm
898a0a038c bml/r2: fix coverity CID 1323765
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 09:39:10 -06:00
Nathan Hjelm
64c8f124fc Use the low instead of the high bit to indicate a proc is a sentinel
The assumption that the high bit is not in use in pointers on any of our
supported platforms was incorrect. A better assumption is that all
ompi_proc_t pointers will be at least 2-byte aligned. This allows us
to use the low bit. To do this we drop the highest bit of the
opal_process_name_t jobid (hope this is ok) and use the low bit to
indicate the proc is really a sentinel.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 09:32:02 -06:00
Ralph Castain
dc5796b8a1 Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local""
Fix the locality computation by correctly computing the vpid of the local peer

This reverts commit open-mpi/ompi@6a8fad49e5.
2015-09-11 08:29:51 -07:00