1
1
Граф коммитов

24526 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
d2f5fca82a configure: add a summary section at the end of configure output
This commit adds two m4 macros: OPAL_SUMMARY_ADD, OPAL_SUMMARY_PRINT.
OPAL_SUMMARY_ADD adds an item to a section in the summary. For example
OPAL_SUMMARY_ADD([[Transports]],[[Foo]],...,[yes]) will add the
following to the summary:

Transports
-----------------------
Foo: yes

With this commit two sections are added: Transports, Resource Managers.

The OPAL_SUMMARY_PRINT macro is called after AC_OUTPUT and prints out
some information about the build (version, projects, etc) and then
the summarys sections. It will additionally print a warning if
internal debugging is enabled.

Example output:

Open MPI configuration:
-----------------------
Version: 3.0.0 a1
Build Open Platform Abstration project: yes
Build Open Runtime project: yes
Build Open MPI project: yes
Build Open SHMEM project: no
MPI C++ bindings (deprecated): no
MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
Debug build: yes

Transports
-----------------------
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no
KNEM Shared Memory: no
Linux CMA IPC: no
Mellanox MXM: no
Open UCX: no
OpenFabrics libfabric: no
OpenFabrics Verbs: no
portals4: no
QLogic Infinipath (PSM): no
tcp: yes
XPMEM Shared Memory: no

Resource Managers
-----------------------
Cray Alps: no
Grid Engine: no
LSF: no
Slurm: yes
Torque: yes

INTERNAL DEBUGGING IS ENABLED. DO NOT USE THIS BUILD FOR PERFORMANCE MEASUREMENTS!

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-03-08 10:04:15 -07:00
George Bosilca
d6fb56af29 Use the correct printf conversion specifier. 2016-02-23 01:26:27 -06:00
Gilles Gouaillardet
308bbcbad1 ompi/dpm: retrieves OPAL_PMIX_ARCH in heterogeneous mode
also remove code duplication by using ompi_proc_complete_init_single()

Thanks Siegmar Gross for reporting this issue, and Ralph for the guidance.
2016-02-22 11:01:06 +09:00
Gilles Gouaillardet
a4aa4c9571 ompi_proc_complete_init_single: make the subroutine public
and accept a proc from a different job
2016-02-22 11:01:06 +09:00
rhc54
1df4457af2 Merge pull request #1392 from rhc54/topic/dvm
Tools don't create the orte_job_data table, so don't remove jobs from it
2016-02-21 17:42:48 -08:00
Ralph Castain
77f800b7e8 Tools don't create the orte_job_data table, so don't remove jobs from it 2016-02-21 16:29:00 -08:00
Ralph Castain
64b7728f33 Fix typo - do not look at daemon job when considering completion of launch 2016-02-21 14:44:51 -08:00
rhc54
b499d4ba2a Merge pull request #1391 from rhc54/topic/dvm
Convert the orte_job_data pointer array to a hash table so it doesn't…
2016-02-21 13:07:59 -08:00
Ralph Castain
d653cf2847 Convert the orte_job_data pointer array to a hash table so it doesn't grow forever as we run lots and lots of jobs in the persistent DVM. 2016-02-21 11:55:49 -08:00
Ralph Castain
309e23ab3a Fix minor typo 2016-02-20 01:33:10 -08:00
yohann
59b6d041f8 mtl/ofi: Check allocated pointer. 2016-02-19 16:59:47 -08:00
yohann
bd47062764 mtl/ofi: Fix error handling. 2016-02-19 16:58:41 -08:00
yohann
404987e9b3 mtl/ofi: Fix mismatching types. 2016-02-19 16:57:26 -08:00
yohann
3ad59435ce mtl/ofi: Prevent possible memory leak. 2016-02-19 16:57:02 -08:00
Ralph Castain
8c92a179c0 Minor memory leak 2016-02-19 15:05:39 -08:00
rhc54
1f7e2d7d41 Merge pull request #1388 from rhc54/topic/iof
Cleanup the output-filename options so they work as expected.
2016-02-19 13:56:05 -08:00
Ralph Castain
0c72ba89b9 Cleanup the output-filename options so they work as expected. Have the remote nodes output locally to the files instead of sending it all back to the HNP.
Fix Solaris issues by renaming struct field
2016-02-19 12:41:46 -08:00
Edgar Gabriel
b33db517c1 Merge pull request #1387 from edgargabriel/dynamic_gen2-overlap
Updates to the dynamic_gen2 component
2016-02-19 12:00:29 -06:00
Edgar Gabriel
92d1b99468 optimize the shuffle step:
1. use communicator collectives if possible for performance reasons
 2. combined multiple allgathers into a single one
2016-02-19 11:04:04 -06:00
Edgar Gabriel
e63836c653 clean up the mca parameter handling of the component. Add new parameters for number of sub groups and write chunk size. This will allow to perform a systematic parameter study. 2016-02-19 10:15:28 -06:00
Edgar Gabriel
4f400314e0 add the dynamic_gen2 component into the fcoll selection table. 2016-02-19 09:32:54 -06:00
Edgar Gabriel
268d525053 change the tag to be a positive value. handle 0-byte situations correctly. 2016-02-19 08:28:50 -06:00
Edgar Gabriel
ad79012059 first cut on the version which overlaps the communication/computation of 2 iterations. 2016-02-19 08:28:50 -06:00
Nathan Hjelm
e57ce1e1ef Merge pull request #1384 from hjelmn/xrc_get_fix
btl/openib: XRC save SRQ#s on the loopback endpoint
2016-02-18 21:48:45 -07:00
Nathan Hjelm
2031bb6f01 btl/openib: XRC save SRQ#s on the loopback endpoint
This commit fixes a bug that can occur when communicating via XRC to
peers on the same node. UDCM was not saving the SRQ numbers on the
loopback endpoint (which shares its ib_addr info with all local peers)
so any messages to local peers use an invalid SRQ number.

Fixes open-mpi/ompi#1383

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-02-18 20:59:11 -07:00
rhc54
bfd4254a7b Merge pull request #1382 from rhc54/topic/cleanup
Cleanup some valgrind complaints about jumps with uninitialized values.
2016-02-18 17:29:37 -08:00
Nathan Hjelm
27e7b6e466 Merge pull request #1381 from hjelmn/ddt_colon_fix
orterun: allow DDT if options contain :'s
2016-02-18 17:48:21 -07:00
Nathan Hjelm
820b178384 Merge pull request #1380 from hjelmn/xrc_get_fix
btl/openib: XRC fix bug that could cause an invalid SRQ# to be used
2016-02-18 17:33:09 -07:00
Ralph Castain
6e68d758b9 Cleanup some valgrind complaints about jumps with uninitialized values. Fix a few IOF issues reported by Mark Santcroos when submitting jobs from tools. Add the ability to pass directives to the --output-filename option that tell ORTE to (a) not include the jobid in the path to the output files, and (b) not to copy the output to the tool (i.e., just store it in the files).
ck

Remove stale debug

Fix a segfault if no subscribers are present
2016-02-18 16:30:37 -08:00
Nathan Hjelm
69de442136 orterun: allow DDT if options contain :'s
There is a bug in MPMD detection that disables totalview if a : is
found anywhere on the command line. This includes inside an argument
option or MCA variable value. This commit changes the check to look
for the string " : " instead of the character : which should eliminate
the issue in most cases.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-18 16:56:08 -07:00
Nathan Hjelm
371df45bf8 btl/openib: fix locking bugs with XRC ib_addr lock
This bug fixes two issue with the ib_addr lock:

 - The ib_addr lock must always be obtained regardless of
   opal_using_threads() as the CPC is run in a seperate thread.

 - The ib_addr lock is held in mca_btl_openib_endpoint_connected when
   calling back into the CPC start_connect on any pending
   connections. This will attempt to obtain the ib_addr lock
   again. Since this is not a performance-critical part of the code
   the lock has been changed to be recursive.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-18 15:55:34 -07:00
Nathan Hjelm
4dc73d7765 btl/openib: XRC fix bug that could cause an invalid SRQ# to be used
This commit fixes a bug that occurs when attempting a get or put
operation on an endpoint that is not already connected. In this case
the remote_srqn may be set to an invalid value as the rem_srqs array
on the endpoint is not populated. This commit moves the usage of the
rem_srqs array to the internal put/get functions where it is
guaranteed this array is populated.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-18 15:44:29 -07:00
Ralph Castain
1748f44147 Stop a segfault that results in zombied processes by checking for NULL prior to object release 2016-02-18 13:48:41 -08:00
Jeff Squyres
7b73c868d5 memchecker.h: fix memchecker no-data case
Thanks to @clintonstimpson for reporting the issue.

Fixes open-mpi/ompi#100

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-02-18 10:48:11 -08:00
rhc54
142e38cbb2 Merge pull request #1358 from rhc54/topic/notification
Enable the PMIx notification callback system and fix debugger attach
2016-02-18 10:15:07 -08:00
Ralph Castain
60a7bc2e50 Enable the PMIx notification callback system. This currently is only supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion.
Fixes ##1225
2016-02-18 09:29:12 -08:00
Nysal Jan K A
c18af0d61f Merge pull request #1378 from nysal/issue_1363
Make UD OOB memory registrations a multiple of page size
2016-02-18 12:34:09 +05:30
rhc54
2745610eb7 Merge pull request #1377 from rhc54/topic/pmix
Plug a leak in the PMIx subsystem
2016-02-17 20:05:45 -08:00
Nysal Jan K.A
cc9b1316a4 Make UD OOB memory registrations a multiple of page size
If ibv_fork_init() has been invoked the pages are marked MADV_DONTFORK.
If we only partially use a page, any data allocated on the remainder of
the page will be inaccessible to the child process.

Fixes open-mpi/ompi#1363
2016-02-17 22:19:49 -05:00
Ralph Castain
efb0eff43e Plug a leak in the PMIx subsystem 2016-02-17 19:00:36 -08:00
rhc54
dc4d3edc06 Merge pull request #1372 from rhc54/topic/sing
Further enhance the support for Singularity containers.
2016-02-17 16:39:23 -08:00
Nathan Hjelm
92a15cc316 Merge pull request #1374 from hjelmn/tune_fix
Fix parsing of envvars in MCA files
2016-02-17 17:24:33 -07:00
Nathan Hjelm
4a8fbb5ff3 Merge pull request #1373 from hjelmn/xrc_fixes
btl/openib/udcm: fix local XRC connections
2016-02-17 17:07:15 -07:00
Nathan Hjelm
32236736a4 Fix parsing of envvars in MCA files
This commit fixes a memory corruption bug when parsing lines of the
form:

-x FOO=bar

The code was making changes to the size of the buffer allocated for
key_buffer without making the appropriate changes to
key_buffer_len. This was causing subsequent calls to save_param_name
to write to invalid memory.

This commit makes the following changes:

  - Fix the above bug by modifying trim_name to move the string within
    the buffer instead of re-allocating space for the trimmed string.

  - Cleaned up both trim_name and save_param_name. Both functions took
    a prefix and suffix to trim. Problem was the prefix was not
    treated like a prefix. Instead the "prefix" was located inside the
    string using strstr then the trimmed value started after the
    substring (even in the middle of the string). To allow trimming
    both -x and --x (as well as -mca and --mca) trim_name is now
    called with each prefix.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-17 14:58:05 -07:00
Nathan Hjelm
4f4ea96940 btl/openib/udcm: fix local XRC connections
This commit ensures ib_addr->remote_xrc_rcv_qp_num value is set when
creating the loopback queue pair. This is needed when communicating
with any other local peer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-17 14:54:19 -07:00
Howard Pritchard
72c7558176 Merge pull request #1371 from hppritcha/topic/alps_common_syms
ras/alps: squelch common symbol warnings
2016-02-17 14:35:43 -07:00
Ralph Castain
8f9508cace Further enhance the support for Singularity containers. Extend the "personality" command-line option to allow specifying both model (e.g., "ompi") and container (e.g., "singularity"), and add the necessary logic to support multiple options. Add a new pmix "isolated" component to handle singletons where no HNP is available since containers cannot launch the HNP. 2016-02-17 13:33:06 -08:00
Howard Pritchard
31841b4367 ras/alps: squelch common symbol warnings
squelch a couple of warnings from the common symbols
script.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-02-17 13:27:29 -06:00
Jeff Squyres
d544e0e6e0 Merge pull request #1347 from ggouaillardet/topic/dss_tests
test/dss: update tests to make them usable again, and run them
2016-02-17 09:01:36 -05:00
Nathan Hjelm
2a728f3cbc Merge pull request #1367 from hjelmn/xrc_fixes
Fix XRC support
2016-02-16 23:43:56 -07:00