1
1
Граф коммитов

26012 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
fb5bcc47ce ess/singleton: use opal_setenv instead of putenv
so it fixes a memory leak on finalize

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-28 09:32:30 +09:00
Gilles Gouaillardet
af67183e2f pml/v: fix a memory leak
close the framework if no more component should be used

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-28 09:32:30 +09:00
Gilles Gouaillardet
ef2b3ac8d2 rml/oob: fix misc memory leaks in open_conduit()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-28 09:28:42 +09:00
Gilles Gouaillardet
831f7d9c9d rml/base: plug misc memory leaks
plug leaks in orte_rml_API_get_contact_info() and orte_rml_base_close()

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-28 09:28:05 +09:00
rhc54
698dac108b Merge pull request #2309 from rhc54/topic/pmix
Update to latest PMIx master - mostly updates example codes, but incl…
2016-10-27 14:21:46 -07:00
Ralph Castain
f4a55118e6 Update to latest PMIx master - mostly updates example codes, but includes one critical cleanup during finalize.
NOTE: set dstore shared memory storage option to "on" by default

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-10-27 11:16:03 -07:00
Jeff Squyres
e9aab634af Merge pull request #2294 from ggouaillardet/topic/fortran_use_mpi_tkr
fortran/use-mpi-tkr misc fixes
2016-10-27 13:09:06 -04:00
Nathan Hjelm
9d92075e60 btl/self: rewrite to decrease memory usage (#2307)
This commit rewrites much of the btl/self component to fix a long
standing memory usage bug. Before this commit the prepare_src path
would always allocate a max send fragment (256kB). This caused the
rank to allocate 32 * 256k useless buffers from one send. This commit
makes the following changes:

 - Add the MCA_BTL_FLAGS_GET flag by default. No reason not to set it.

 - Reduce the eager limit, max send size, buffers per allocation, and
   maximum buffer count per fragment size. These changes should have
   no noticible affect on performance but should greatly reduce the
   memory usage of the component.

 - Implement the sendi function. This should reduce self send latency
   somewhat.

 - Rewrite prepare_src to never allocate a eager or max send fragment
   for contiguous data.

 - add_procs needs to return something in the peer array for the proc
   self not just set the reachability bit. Now stores (void *) 1.

 - Various cleanups. Removed and unused file.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-10-27 12:34:54 -04:00
Mike Dubman
83e3323646 Merge pull request #2299 from alex-mikheev/topic/oshmem_init_fix
OSHMEM: fixes crash during initialization
2016-10-27 14:48:33 +03:00
Gilles Gouaillardet
3d4285b04d oob/tcp: silence valgrind warning
fully initialize allocated memory to keep valgrind happy

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-27 17:12:46 +09:00
Alex Mikheev
511dd43736
oshmem: fixes typo in the error message
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-10-27 09:27:45 +03:00
Gilles Gouaillardet
ad7f3f93b0 mpi: support MPI_Dims_create(..., ndims=0, ...)
this is a bozo case, but it should not fail with MPI_ERR_DIMS

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-27 14:13:51 +09:00
Gilles Gouaillardet
bf789762a7 fortran/use-mpi-tkr: fix MPI_Sizeof handling
MPI_Sizeof related stuff has been moved to their own files.
Remove MPI_Sizeof from Fortran interfaces when it cannot be built
(e.g. stock gcc 4.8 on CentOS 7)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-27 08:54:28 +09:00
Gilles Gouaillardet
1a16e68c26 fortran/use-mpi-trk: add PMPI_* interfaces in mpi module
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-27 08:54:07 +09:00
Gilles Gouaillardet
8e26e78728 fortran/use-mpi-tkr: only build MPI_File support if requested
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-27 08:54:07 +09:00
Gilles Gouaillardet
5543b19e9a fortran/use-mpi-tkr: rename mpi-f90-cptr-interfaces.F90 into mpi-f90-cptr-interfaces.h
this file is meant to be included and not compiled, so use a consistent naming

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-27 08:54:07 +09:00
rhc54
2b18044051 Merge pull request #2301 from rhc54/topic/update
Update PMIx to latest master tarball. Ensure we set the HNP name for …
2016-10-26 16:42:15 -07:00
rhc54
60099c9d0e Merge pull request #2285 from anandhis/ofi2
Ofi2
2016-10-26 15:52:37 -07:00
Ralph Castain
f298f294e1 Update PMIx to latest master tarball. Ensure we set the HNP name for orted's so that PMIx_Lookup can find the server
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-10-26 15:48:56 -07:00
Anandhi S Jayakumar
94593ca20b Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.
modified:   ../orte/mca/rml/base/rml_base_frame.c
	modified:   ../orte/mca/rml/base/rml_base_stubs.c
	deleted:    ../orte/mca/rml/ofi/.opal_ignore
	modified:   ../orte/mca/rml/ofi/Makefile.am
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c
	modified:   ../orte/test/system/ofi_conduit_stress.c

	Removed stale include directive
	modified:   ../orte/mca/rml/ofi/Makefile.am

The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id,  changed the previous name conduit_id to ofi_prov_id
	modified:   ../orte/mca/rml/base/base.h
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_request.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.

	modified:   ../orte/mca/rml/base/rml_base_frame.c
	modified:   ../orte/mca/rml/base/rml_base_stubs.c
	deleted:    ../orte/mca/rml/ofi/.opal_ignore
	modified:   ../orte/mca/rml/ofi/Makefile.am
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c
	modified:   ../orte/test/system/ofi_conduit_stress.c

	Removed stale include directive
	modified:   ../orte/mca/rml/ofi/Makefile.am

The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id,  changed the previous name conduit_id to ofi_prov_id
	modified:   ../orte/mca/rml/base/base.h
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_request.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

Fixed merge issues, and minor pull-request comments
	modified:   ../orte/mca/rml/base/base.h
	modified:   ../orte/mca/rml/base/rml_base_frame.c
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c

Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.

	modified:   ../orte/mca/rml/base/rml_base_frame.c
	modified:   ../orte/mca/rml/base/rml_base_stubs.c
	deleted:    ../orte/mca/rml/ofi/.opal_ignore
	modified:   ../orte/mca/rml/ofi/Makefile.am
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c
	modified:   ../orte/test/system/ofi_conduit_stress.c

	Removed stale include directive
	modified:   ../orte/mca/rml/ofi/Makefile.am

The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id,  changed the previous name conduit_id to ofi_prov_id
	modified:   ../orte/mca/rml/base/base.h
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_request.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.

	modified:   ../orte/mca/rml/base/rml_base_frame.c
	modified:   ../orte/mca/rml/base/rml_base_stubs.c
	deleted:    ../orte/mca/rml/ofi/.opal_ignore
	modified:   ../orte/mca/rml/ofi/Makefile.am
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c
	modified:   ../orte/test/system/ofi_conduit_stress.c

	Removed stale include directive
	modified:   ../orte/mca/rml/ofi/Makefile.am

Fixed merge issues, and minor pull-request comments
	modified:   ../orte/mca/rml/base/base.h
	modified:   ../orte/mca/rml/base/rml_base_frame.c
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c

Removed trailing space
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c

Cleaned up test- ofi_conduit_stress.c
	modified:   ../orte/test/system/ofi_conduit_stress.c

cleaned up printing the provider info during initialisation
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c

Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>

Fixing warnings
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>

minor cleanup
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>

more cleanup
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c

Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>

Sending the ethernet address only in the get_contact_info, rest will be sent through modex
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c

Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>

Adding error logging on failures
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c

Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>

Handling the OPAL_MODEX_SEND/RECV generically for all ofi providers.
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>

Adding to build ofi for limited people
	new file:   ../orte/mca/rml/ofi/.opal_ignore
	new file:   ../orte/mca/rml/ofi/.opal_unignore

Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>

Removign the error logging for now
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
2016-10-26 13:11:07 -07:00
Alex Mikheev
f630b43285
OSHMEM: fixes crash during initialization
Do not call mpi comm_dup() if mpi failed to initialize. Also do not set
signal handlers.
Small code styling fixes.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-10-26 11:15:06 +03:00
Gilles Gouaillardet
8cc3f288c9 opal: fix opal_class_finalize() usage
the class system can be initialized/finalized as many times as we like,
so there is no more need to have opal_class_finalize() invoked in a destructor

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-26 15:15:54 +09:00
rhc54
31799ece6f Merge pull request #2298 from rhc54/topic/notify
When mpirun operates in --continuous mode, we won't terminate the job…
2016-10-25 15:00:04 -07:00
Ralph Castain
d031946c46 When mpirun operates in --continuous mode, we won't terminate the job when a remote process dies. In that case, we have to activate both the waitpid _and_ the IOF complete states to ensure we properly mark the proc as dead and perform any required notifications
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-10-25 12:18:14 -07:00
Edgar Gabriel
2076622924 Merge pull request #2238 from edgargabriel/pr/delete-error-codes
update the error codes reported by file_delete
2016-10-25 12:38:03 -05:00
George Bosilca
028e747470 Do not alter ompi_coll_tuned_use_dynamic_rules.
This is set globally as an MCA parameter and should be never
altered based on a single communicator setting.
2016-10-25 12:17:25 -04:00
George Bosilca
253eb80e26 Code cleaning of the tuned module. 2016-10-25 12:17:25 -04:00
Edgar Gabriel
74441b960b update the error codes reported by file_delete 2016-10-25 10:15:14 -05:00
Jeff Squyres
582f290519 Merge pull request #1014 from ggouaillardet/poc/libnl_version_check
configury: check libnl version and warn in case of conflict
2016-10-25 10:34:18 -04:00
Gilles Gouaillardet
b1aedf457f Merge pull request #2283 from ggouaillardet/topic/pml_ob1_recv_request_type_reset
pml/ob1: correctly reset receive request type before init
2016-10-25 15:15:59 +09:00
Gilles Gouaillardet
8e788b5aee pml/ob1: refactor append_recv_req_to_queue() to improve readability
and fix a typo in a comment

Thanks George for the patch
2016-10-25 10:50:40 +09:00
rhc54
3e430caed8 Merge pull request #2290 from rhc54/topic/rmlapp
Open the conduits for application procs
2016-10-24 18:18:06 -07:00
Gilles Gouaillardet
f2a80dc09f configury: check libnl version and abort in case of conflict
libnl and libnl-3 are known to conflict with each other, so detect
and abort if these two libs are both used directly (e.g. Open MPI
uses libnl-3) or indirectly (e.g. libibverbs.so might depend on libnl)
2016-10-25 09:23:59 +09:00
Ralph Castain
227d4d9609 Open the conduits for application procs - we probably can remove all the
RML-related frameworks from MPI applications now, but let's wait a bit
to ensure we have cleaned up all the points where messaging might occur.
2016-10-24 16:53:19 -07:00
rhc54
dae02c7e43 Merge pull request #2282 from rhc54/topic/routed
Update ORTE to more fully support the new conduit messaging system
2016-10-24 08:06:36 -07:00
Gilles Gouaillardet
4a886ac4cc pml/ob1: correctly reset receive request type before init
recvreq->req_recv.req_base.req_type should always be set before invoking
MCA_PML_OB1_RECV_REQUEST_INIT(recvreq, ...) otherwise, the previous type
might be set, and you could end up with MPC_PML_REQUEST_IMPROBE when
MCA_PML_REQUEST_RECV is expected.

Thanks Chris Pattison for the report and test case.

Fixes open-mpi/ompi#2275
2016-10-24 16:50:23 +09:00
Ralph Castain
649301a3a2 Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier).
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
2016-10-23 21:52:39 -07:00
Gilles Gouaillardet
055df6f7c6 fortran: correctly defines MPI_DISPLACEMENT_CURRENT with KIND=MPI_OFFSET_KIND
and remove unused ompi/include/mpif-mpi-io.h
2016-10-24 09:53:53 +09:00
Gilles Gouaillardet
e2769e4343 fortran/use-mpi-ignore-tkr: fix typo in MPI_File_write_at_all_begin interface 2016-10-24 09:53:52 +09:00
Gilles Gouaillardet
e02dc1e637 fortran/use-mpi-ignore-trk: fix typo in MPI_File_read_ordered_begin interface 2016-10-24 09:53:52 +09:00
Gilles Gouaillardet
6714f6aee7 coll/libnbc: fix MPI_Ialltoallv with MPI_IN_PLACE and without MPI param check 2016-10-24 09:29:06 +09:00
Gilles Gouaillardet
98f62690f1 man: fix typos in MPI_Info_get_{nkeys,nthkey}
Thanks Nicolas Joly for the patch
2016-10-24 09:04:29 +09:00
Jeff Squyres
7d7cf100ea Merge pull request #2268 from karasevb/fix_oshmem_examples
oshmem: updated API oshmem examples to OSHMEM 1.3
2016-10-22 09:29:16 -04:00
rhc54
4a65b197f9 Merge pull request #2271 from rhc54/topic/ft
Properly mark a node as down and decrease the number of daemons so any
2016-10-21 14:10:09 -05:00
rhc54
900ae15d49 Merge pull request #2221 from bharatpotnuri/master
btl/openib: remove unwanted ompi header inclusion in opal code.
2016-10-21 14:05:55 -05:00
Josh Hursey
d1ecc83e14 Merge pull request #2245 from jjhursey/topic/libnbc-error-path
coll/libnbc: Fix error path on internal error
2016-10-21 13:27:17 -05:00
Ralph Castain
df8ac7b747 Properly mark a node as down and decrease the number of daemons so any
subsequent grpcomm collectives can correctly operate. Note that only the
direct grpcomm component knows how to deal with down nodes.
2016-10-21 09:53:37 -07:00
Joshua Hursey
8748e54c11 coll/libnbc: Fix error path on internal error
* If an error is detected internal to libnbc (e.g., PML truncation error)
   this patch makes sure that the request is completed and the `MPI_ERROR`
   field is set approprately.
 * Make an attempt to cleanup outstanding requests before returning.
   - This is a "best attempt" since not all PMLs support canceling requests.
2016-10-21 11:41:08 -04:00
Boris Karasev
e894a89db8 oshmem: updated API oshmem examples to OSHMEM 1.3 2016-10-21 12:36:53 +06:00
rhc54
2a9f818d24 Merge pull request #2267 from rhc54/topic/pmixup
Update to latest PMIx master
2016-10-21 00:57:28 -05:00