1
1
Граф коммитов

5215 Коммитов

Автор SHA1 Сообщение Дата
rhc54
63ba088d09 Merge pull request #2108 from rhc54/topic/reorder
Mucho thanks to Gilles - his patch to reorder the CPPFLAGS solves the…
2016-09-22 11:04:21 -05:00
Ralph Castain
a14ec3bdbc Mucho thanks to Gilles - his patch to reorder the CPPFLAGS solves the problem of inadvertently picking up hwloc and libevent headers from locations in CPPFLAGS while continuing to build the embedded versions. Also silence a minor warning about an uninitialized var. 2016-09-22 07:39:22 -07:00
Gilles Gouaillardet
c7bf9a0ec9 ess/singleton: fix read on the pipe to spawn'ed orted
and close the pipe on both ends when it is no more needed
2016-09-22 14:21:52 +09:00
Ralph Castain
de7b1494d9 Clean out old cruft from the ORCM project 2016-09-21 00:13:30 -07:00
Gilles Gouaillardet
83399adb3f singleton: "safe" read/write to the pipe between (spawn'ed) orted and singleton 2016-09-20 14:56:58 +09:00
Gilles Gouaillardet
e7ae6975d0 orted: fix spawn in singleton mode
in singleton mode, have the spawn'ed orted invoke orte_pre_condition_transports()
and send the transport key back to the singleton
2016-09-20 14:39:22 +09:00
Gilles Gouaillardet
d84ac9bdc5 orted: remove debug
remove debug code that was added by mistake in open-mpi/ompi@eae9d31784
2016-09-19 19:15:42 +09:00
Gilles Gouaillardet
eae9d31784 pre_condition_transports: code cleanup
replace hard coded "OMPI_MCA_orte_precondition_transports" environment variable name
with macro'ed OPAL_MCA_PREFIX"orte_precondition_transports"
2016-09-19 13:31:47 +09:00
Ralph Castain
e55cc63da9 Remove debug 2016-09-16 07:06:58 -07:00
Ralph Castain
a16b3cc33d Fix some minor complaints - missing "void" in function parameters 2016-09-15 15:18:42 -07:00
Ralph Castain
6f086189e6 Fix trivial typo 2016-09-15 13:10:55 -07:00
Gregory M. Kurtzer
16794cc260 Updates to support Singularity containers v2.2 2016-09-15 09:52:06 -07:00
Gilles Gouaillardet
11ebf3ab23 ess/singleton: when forking hnp, use the PMIX_NAMESPACE sent by the hnp
as the jobid
2016-09-15 13:57:23 +09:00
Gilles Gouaillardet
628c730196 pkgconfig: define the pkgincludedir variable in *.pc files
this has been made necesarry with open-mpi/ompi@12e796dcaf

Refs open-mpi/ompi#2069
2016-09-13 09:50:14 +09:00
Gilles Gouaillardet
e84b35217f oob/tcp: plug a memory leak
as reported by Coverity with CID 1196711
2016-09-08 18:50:18 +09:00
Gilles Gouaillardet
b2a2be0e5a odls: fix memory leak plug
This fixes commit open-mpi/ompi@e2c343cdfc.
2016-09-08 10:02:52 +09:00
Jeff Squyres
fd829ac389 Merge pull request #1982 from jsquyres/pr/fix-pkg-config-static
pkg-config: fix static linking
2016-09-07 14:55:50 -04:00
Jeff Squyres
b811b0a15c Merge pull request #2060 from jsquyres/pr/remove-unused-var
orte proc_info.c: remove unused variable
2016-09-07 06:33:26 -04:00
Artem Polyakov
9eba1b0b75 Merge pull request #2042 from artpol84/pmix_sdirs
Several fixes related to session directories:
2016-09-07 14:15:47 +07:00
Artem Polyakov
a9a7f39773 ess/pmi: fix the comments about MCA/PMIx setting conflict resolution. 2016-09-07 07:47:35 +03:00
Gilles Gouaillardet
be41b120d0 orted: plug misc memory leaks
as reported by Coverity with CID 1362603 and 1362606
2016-09-07 10:08:44 +09:00
Gilles Gouaillardet
e2c343cdfc odls: plus memory leak
as reported by Coverity with CID 710645
2016-09-07 10:08:44 +09:00
Gilles Gouaillardet
c09899f6af plm: plus resource leaks
as reported by Coverity with CIDs 72274 and 1196733
2016-09-07 10:08:44 +09:00
Jeff Squyres
722d5eecf1 orte proc_info.c: remove unused variable
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-09-06 16:38:15 -07:00
Josh Hursey
f6337f9eae Merge pull request #2047 from jjhursey/topic/mixed-host2
orte: !FQDN implementation to use opal_net_isaddr
2016-09-06 13:08:54 -05:00
Ralph Castain
f85dcaee2a Fixes CID 1369067 and CID 1196684
Fixes CID 1369648

    Fixes CID 1372409
2016-09-06 08:43:15 -07:00
Artem Polyakov
74a11d7832 Fix session dir cleanup code. 2016-09-05 07:53:55 +03:00
Artem Polyakov
dc0ab674de Add PMIx key to provide RM with ability to indicate that it will cleanup
session directories provided at through OPAL_PMIX_TMPDIR,
OPAL_PMIX_NSDIR, OPAL_PMIX_PROCDIR
2016-09-05 07:48:44 +03:00
Artem Polyakov
81195ab724 Several fixes related to session directories:
* enable OMPI to retrieve paths from RM through PMIx
* cleanups related to tempdirs.
2016-09-05 07:48:44 +03:00
Ralph Castain
fb51d65049 Minor change: check for NULL before using the job map to avoid segfault when erroring out prior to creating the map 2016-09-04 07:53:12 -07:00
Joshua Hursey
fe937d1e82 orte: !FQDN implementation to use opal_net_isaddr
* Switch to use opal_net_isaddr() for checking if a name is an IP
   address - as it is a bit cleaner, and uses common functionality.
2016-09-02 13:31:49 -05:00
Ralph Castain
4e0788e9ad Enable PSM to support dynamic processes
Fix comm_spawn to correctly reference the actual parent process that requested the spawn when looking for the parent job object
2016-09-02 10:22:04 -07:00
Ralph Castain
0ea1cff733 Implement notification of completion on comm_spawn'd child jobs. Add a configure flag to enable PMIx 3's shared memory datastore, and set it disable by default so that comm_spawn functions again. Will reverse the default once that feature is fully functional 2016-09-01 13:10:10 -07:00
Gilles Gouaillardet
0b8c58298d oob/usock: fix handling of orte_process_name_t *
orte_process_name_t is aligned on 32 bits, so it cannot simply be casted
into an int64_t. use memcpy() instead

Thanks Paul Hargrove for the report
2016-09-01 13:18:02 +09:00
Ralph Castain
c1050bc01e Provide a mechanism for obtaining memory profiles of daemons and application profiles for use in studying our memory footprint. Setting OMPI_MEMPROFILE=N causes mpirun to set a timer for N seconds. When the timer fires, mpirun will query each daemon in the job to report its own memory usage plus the average memory usage of its child processes. The Proportional Set Size (PSS) is used for this purpose. 2016-08-31 09:32:07 -07:00
Ralph Castain
9b991bd1f5 Ensure that the "running" state is correctly updated
It is possible that one or more procs could get thru PMIx_Init, and thus be marked as in state "registered", before all local procs have been started. If that happens, then we would report some of the procs in state "running", and the others in state "registered" - which means that the HNP would miss the "running" stage of the state machine.

Thanks to Jingchao Zhang for his patience in tracking this down on the 2.0 branch
2016-08-30 19:24:39 -07:00
Ralph Castain
cfa784c9a6 Since we changed storage to pointers in pmix_value_t, we need to allocate space for those values when unpacking 2016-08-29 20:22:24 -07:00
Josh Hursey
b0d8638824 Merge pull request #2015 from jjhursey/topic/mixed-hostnames
orte: Expand use of !orte_keep_fqdn_hostnames MCA parameter
2016-08-29 09:14:54 -05:00
Ralph Castain
2f6e0fec90 Provide the number of nodes in the job 2016-08-26 14:50:41 -07:00
Joshua Hursey
d26dd2c20e orte: Expand the application of !orte_keep_fqdn_hostnames
* Expand the use of the `orte_keep_fqdn_hostnames` MCA parameter when
   it is set to false.
 * If that parameter is set to false (default) then short hostnames
   (e.g., `node01`) will match with the long hostnames (e.g.,
   `node01.mycluster.org`). This allows a user (or resource manager)
    to mix the use of short and long hostnames.
  - Note that this mechanism does _not_ perform a DNS lookup, but
    instead strips off the FQDN by truncating the hostname string at
    the first `.` character (when not an IP address).
     - By default (`false`) the following is true:
       `node01 == node01.mycluster.org == node01.bogus.com`
       since we use `node01` as the hostname.
2016-08-26 16:09:04 -05:00
Artem Polyakov
55ac3b0be3 orte/schizo: fix binding detection in slurm component
in SLURM 16.05 the SLURM_CPU_BIND_TYPE is equal to "mask_cpu:"
instead of "mask_cpu". Account for that.
2016-08-26 09:55:52 +03:00
rhc54
19b0f4db9f Merge pull request #1995 from rhc54/topic/pe-per-rank
Change the behavior of cpus-per-rank.
2016-08-25 14:38:12 -05:00
Ralph Castain
440eae90ec Correct the binding algorithm to decouple it from oversubscribe.
Oversubscribe stipulates that we allow more procs on the node than assigned slots - it has nothing to do with the number of available pe's. Let overload directives handle the pe situation.
2016-08-24 21:17:22 -07:00
Ralph Castain
92102304b6 Minor typo - init the job_data stdin_target field to 0 for default behavior. Add test. 2016-08-22 21:03:45 -07:00
Gilles Gouaillardet
93e73841f9 ess/singleton: push all PMIX_* environment variables, regardless how many there are 2016-08-23 09:46:55 +09:00
Gilles Gouaillardet
a1e8e58a8a ess/singleton: expects 4 PMIX_* environment variables or more 2016-08-23 09:34:03 +09:00
Ralph Castain
7de4d6922b Change the behavior of cpus-per-rank. We previously counted each cpu against the #slots. However, IBM has pointed out that "slot" is equated to the number of processes allowed to run on each node, and not the number of cpus on the node. This has been a continuing source of confusion, so make the distinction a "hard" one.
Each process occupies a "slot". We automatically set #slots = #cpus if nothing else is told to us. If you want to run more procs and slots, you must tell us to allow oversubscription.

A process can utilize multiple pe's if that option is given. If you try to bind more than one proc to a given pe, then we will error out unless you tell us to allow overloading.
2016-08-22 15:54:41 -07:00
Ralph Castain
9888615e75 Restore the coll/sync module and provide a test to verify its operation 2016-08-20 10:14:52 -07:00
Jeff Squyres
fb894e6e3e pkg-config: fix static linking
We need to list all major project libraries in the private libraries
line to enable static linking to work properly.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-17 20:37:51 -05:00
Jeff Squyres
71ec5cfb43 rsh: robustify the check for plm_rsh_agent default value
Don't strcmp against the default value -- the default value may change
over time.  Instead, check to see if the MCA var source is not
DEFAULT.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-16 06:58:20 -05:00