1
1
Граф коммитов

545 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
408da16d50 ompi/proc: add proc hash table for ompi_proc_t objects
This commit adds an opal hash table to keep track of mapping between
process identifiers and ompi_proc_t's. This hash table is used by the
ompi_proc_by_name() function to lookup (in O(1) time) a given
process. This can be used by a BTL or other component to get a
ompi_proc_t when handling an incoming message from an as yet unknown
peer.

Additionally, this commit adds a new MCA variable to control the new
add_procs behavior: mpi_add_procs_cutoff. If the number of ranks in
the process falls below the threshold a ompi_proc_t is created for
every process. If the number of ranks is above the threshold then a
ompi_proc_t is only created for the local rank. The code needed to
generate additional ompi_proc_t's for a communicator is not yet
complete.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
Ralph Castain
0d5814b5ca Cleanup Coverity issues 2015-08-29 21:19:27 -07:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Ralph Castain
023936e84b Silence coverity warnings 2015-07-29 07:28:08 -07:00
Ralph Castain
8d128fe090 Remove the non-null attributes from the cmd_line parser as this isn't something we can guarantee, and the optimization isn't worth the potential for error 2015-06-25 13:26:20 -07:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Gilles Gouaillardet
58d1b3f4d0 opal_os_dirpath_create: fix TOCTOU
as reported by Coverity with CID 70396
2015-06-17 11:17:54 +09:00
Gilles Gouaillardet
de66447ebb opal_cmd_line_get_usage_msg: silence warning
as reported by Coverity with CID 1269967
2015-06-17 11:17:54 +09:00
Gilles Gouaillardet
f2f66e6e63 opal_daemon_init: silence warning
as reported by Coverity with CID 710642
2015-06-17 11:17:53 +09:00
Gilles Gouaillardet
8427e87ee9 opal_argv_delete: silence warning
as reported by Coverity with CID 71914
2015-06-17 11:17:53 +09:00
Gilles Gouaillardet
bcdb2d1380 add missing #include
sscanf requires stdio.h
fixes commit open-mpi/ompi@6ca57724c4
2015-06-08 09:13:11 +09:00
Jeff Squyres
0acec2b676 opal/util/net.c: remove stale comment
Also wrap a long "if" statement -- but make no code logic changes.
2015-06-06 10:17:20 -07:00
Jeff Squyres
6ca57724c4 opal/util/net.c: remove superflous #include 2015-06-06 10:17:20 -07:00
Nathan Hjelm
0e3c32a98a opal/sys_limits: fix coverity issue
CID 996175 Dereference before null check (REVERSE_NULL)

If lims is NULL then we ran out of memory. Return an error and remove
the NULL check at cleanup.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:10 -06:00
Nathan Hjelm
f5389cbb03 opal/keyval: fix coverity issues
CID 1292738 Dereference after null check (FORWARD_NULL)

It is an error if NULL is passed for val in add_to_env_str. Removed
the NULL-check @ keyval_parse.c:253 and added a NULL check and an
error return.

CID 1292737 Logically dead code (DEADCODE)

Coverity is correct, the error code at the end of parse_line_new is
never reached. This means we fail to report parsing errors when
parsing -x and -mca lines in keyval files. I moved the error code into
the loop and removed the checks @ keyval_parse.c:314.

I also named the parse state enum type and updated parse_line_new to
use this type.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:09 -06:00
Nathan Hjelm
9caffa5dd8 mca/base: fix source file name bug for synonyms
This commit fixes synonyms so the source file is correctly printed out
by ompi_info. This commit also adds support for printing out the line
number where the variable is set.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-12 09:52:31 -06:00
Gilles Gouaillardet
c809aace47 initialize common symbols from opal
A few uninitialized common symbols are remaining:

common symbols generated by flex :
 * opal/util/keyval/keyval_lex.l: opal_util_keyval_yyleng
 * opal/util/keyval/keyval_lex.o: opal_util_keyval_yytext
 * opal/util/show_help_lex.l: opal_show_help_yyleng
 * opal/util/show_help_lex.l: opal_show_help_yytext

common symbol generated by "external" hwloc library:
 * opal/mca/hwloc/hwloc191/hwloc/src/components.o: component_map
2015-05-08 09:48:51 +09:00
Ralph Castain
9cb2fcfa5c Cleanup the qos code when --enable-timings is given 2015-05-06 20:24:27 -07:00
Nadezhda Kogteva
116169c38a opal timing: added ability to choose the timer type 2015-04-17 11:15:55 +03:00
Nathan Hjelm
75f210fdb9 opal/util/error: check for existing convertor for error range
This commit fixes a bug when opal_error_init is called with the same
values multiple times. If opal_error_init is called too many times it
will start failing with OPAL_ERR_OUT_OF_RESOURCE. To fix the problem
check if an existing convertor matching the requested one and return
that one instead.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-09 11:51:36 -06:00
Nathan Hjelm
9cd955badf opal: fix multiple bugs in MCA and opal
This commit fixes the following bugs:

 - opal_output_finalize did not properly set internal state. This
   caused problems when calling the sequence opal_output_init (),
   opal_output_finalize (), opal_output_init ().

 - opal_info support called mca_base_open () but never called the
   matching mca_base_close (). mca_base_open () and mca_base_close ()
   have been updated to use a open count instead of an open flag to
   allow mca_base_open to be called through multiple paths (as may be
   the case when MPI_T is in use).

 - orte_info support did not register opal variables. This can cause
   orte-info to not return opal variables.

 - opal_info, orte_info, and ompi_info support have been updated to
   use a register count.

 - When opening the dl framework the reference count was added to
   ensure the framework stuck around. The framework being closed
   prematurely was a bug in the MCA base that has since been
   corrected. The increment (and associated decrement) have been
   removed.

 - dl/dlopen did not set the value of
   mca_dl_dlopen_component.filename_suffixes_mca_storage on each call
   to register. Instead the value was set in the component
   structure. This caused the value to be lost when re-loading the
   component. Fixed by setting the default value in register.

 - Reset shmem framework state on close to avoid returning a stale
   component after reloading opal/shmem.

 - MCA base parameters were not properly deregistered when the MCA
   base was closed.

This commit may fix #374.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-07 19:13:20 -06:00
Elena
90f5b2bb84 Introduce -tune command line option to set env vars and mca params from file 2015-03-26 18:33:53 +02:00
Gilles Gouaillardet
dc0bc756dc iof/base: fix misc memory leak
as reported by Coverity with CID 1196732
2015-03-10 14:37:53 +09:00
Jeff Squyres
0a2767a5d3 opal lt_interface: remove in favor of opal_dl interface 2015-03-09 08:18:13 -07:00
Gilles Gouaillardet
3511475e29 opal/util: fix misc memory leak
as reported by Coverity with CID 996174
2015-02-27 19:19:46 +09:00
Jeff Squyres
9d7171e8f1 convert: remove unnecessary/unused opal_size2int() function
The comments in the file even said "This file will hopefully not last
long in the tree...".
2015-02-16 07:17:33 -08:00
Gilles Gouaillardet
ccbdf64de4 opal/util: fix memory leak in opal_util_init_sys_limits
as reported by Coverity with CID 996174
previous commit (open-mpi/ompi@ca3a275823)
dit not fix this CID
2015-02-16 11:05:35 +09:00
Gilles Gouaillardet
ca3a275823 opal/util: fix misc memory leaks reported by Coverity
fixes CID 996174, 996920, 1196735, 1196769 and 1196770
2015-02-13 14:28:59 +09:00
Jeff Squyres
a1037cd70a if.c: fix minor memory leak
This was CID 1269846.
2015-02-12 13:41:29 -08:00
Jeff Squyres
29794af0e9 cmd_line.c: use strncat() instead of strcat()
Be safe about appending to the end of strings.

This was CID 71932 (and probably also others).
2015-02-12 13:41:29 -08:00
Jeff Squyres
e188c75edc opal_environ.c: ensure "value" is a valid string for the setenv() case
This was CID 1269764.
2015-02-12 13:41:29 -08:00
Jeff Squyres
167d72ec68 net.c: ensure to free the args in the error case
This was CID 710643.
2015-02-12 10:24:02 -08:00
Jeff Squyres
08285c6361 lt_interface: properly check OPAL_HAVE_LTDL_ADVISE 2015-02-11 12:25:20 -08:00
Mike Dubman
da5b8c6879 OPAL: skip comparison when when fs=autofs in mtab, because we are looking for reals fs type 2014-12-18 21:42:25 +02:00
Artem Polyakov
01601f3284 Merge pull request #305 from artpol84/timing
Timing framework improvement
2014-12-16 15:13:48 +06:00
Mike Dubman
2fbe87defe Merge pull request #314 from miked-mellanox/topic/fix_opal_path_nfs
add support for autofs and make check pass. jenkins: check,src_rpm
2014-12-15 20:52:52 +02:00
Mike Dubman
42f3fa0d1e OPAL: add support for autofs magic type 2014-12-13 20:27:47 +02:00
Jeff Squyres
9e6b157cb6 opal: minor update to guess_strlen
This is a minor update to
open-mpi/ompi@c52601f0c5.

If we have vsnprintf(), we might as well not have the rest of the
guess_strlen() routine.  Also document the nifty trick/behavior of
vsnprintf() that enables this shortcut (it was new to me!).
2014-12-13 08:09:34 -05:00
Ralph Castain
c52601f0c5 It looks like the guess_len function in our local printf.c has some questionable code in it. Now that we are checking in configure for vsnprintf, take advantage of that check to use the far simpler method if it is available. Given that we no longer support such ancient systems where this might not be available, one suspects the other questionable code may no longer be required - but set that aside for another day. 2014-12-12 17:47:17 -08:00
Artem Polyakov
8ffad75a0a Introduce timing interval measurement facility in timing framework 2014-12-10 16:47:49 +06:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Ralph Castain
4e4920a0fd Fix stupid typo 2014-11-05 08:56:40 -08:00
Ralph Castain
2c9987b7d1 Update the opal_environ code so it behaves correct with the environ if setenv is not available 2014-11-05 08:54:06 -08:00
Ralph Castain
907b4606c5 Check for the presence of setenv. If it is present, then use it in opal_setenv when setting values in the environ 2014-11-04 16:11:54 -08:00
Gilles Gouaillardet
62bde1fcb5 opal/util/proc.c: handle unaligned opal_process_name_t parameters 2014-10-27 14:40:10 +09:00
Gilles Gouaillardet
b5aea782ce Revert "Fix heterogeneous support"
Per the discussion at http://www.open-mpi.org/community/lists/devel/2014/10/16050.php

This reverts commit c9c5d4011b.
2014-10-16 12:24:38 +09:00
Gilles Gouaillardet
c9c5d4011b Fix heterogeneous support
* redefine orte_process_name_t so it can be converted
  between host and network format as an opal_identifier_t
  aka uint64_t by the OPAL layer.
* correctly send OPAL_DSTORE_ARCH key
2014-10-15 17:19:13 +09:00
Ralph Castain
fd6a044b7f Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages.
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.
2014-10-03 16:02:57 -06:00