1
1
openmpi/ompi
Ralph Castain 33ab928e1b ompi_proc_t size reduction: part 1
We currently save the hostname of a proc when we create the ompi_proc_t for it. This was originally done because the only method we had for discovering the host of a proc was to include that info in the modex, and we had to therefore store it somewhere proc-local. Obviously, this ccarried a memory penalty for storing all those strings, and so we added a "cutoff" parameter so that we wouldn't collect hostnames above a certain number of procs.

Unfortunately, this still results in an 8-byte/proc memory cost as we have a char* pointer in the opal_proc_t that is contained in the ompi_proc_t so that we can store the hostname of the other procs if we fall below the cutoff. At scale, this can consume a fair amount of memory.

With the switch to relying on PMIx, there is no longer a need to cache the proc hostnames. Using the "optional" feature of PMIx_Get, we restrict the retrieval to be purely proc-local - i.e., we retrieve the info either via shared memory or from within the proc-internal hash storage (depending upon the active PMIx components). Thus, the retrieval of a hostname is purely a local operation involving no communication.

All RM's are required to provide a complete hostname map of all procs at startup. Thus, we have full access to all hostnames without including them in a modex or having to cache them on each proc. This allows us to remove the char* pointer from the opal_proc_t, saving us 8-bytes/proc.

Unfortunately, PMIx_Get does not currently support the return of a static pointer to memory. Thus, even though PMIx has the hostname in its memory, it can only return a malloc'd version of it. I have therefore ensured that the return from opal_get_proc_hostname is consistently malloc'd and free'd wherever used. This shouldn't be a burden as the hostname is only used in one of two circumstances:

(a) in an error message
(b) in a verbose output for debugging purposes

Thus, there should be no performance penalty associated with the malloc/free requirement. PMIx will eventually be returning static pointers, and so we can eventually simplify this method and return a "const char*" - but as noted, this really isn't an issue even today.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-23 12:49:44 -07:00
..
attribute Cleanup singleton detection and data retrieval 2020-03-16 12:25:28 -07:00
class Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
communicator Merge pull request #7228 from devreal/progress-returns 2020-02-28 20:15:37 -05:00
contrib ompitrace: MPI_Address -> MPI_Get_address 2018-05-01 15:20:02 -06:00
datatype Mark predefined empty datatype contiguous. 2019-09-07 14:40:21 +10:00
debuggers Remove ORTE project 2020-02-07 18:20:06 -08:00
dpm Some memchecker cleanup and others. 2020-03-05 16:44:18 -05:00
errhandler Merge pull request #7323 from bosilca/fix/7320 2020-02-27 06:28:44 -05:00
etc Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
file ompi/file: rename ompi_file_t's f_mutex into f_lock 2017-12-01 16:06:22 +09:00
group opal: add types for atomic variables 2018-09-14 10:48:55 -06:00
include Some memchecker cleanup and others. 2020-03-05 16:44:18 -05:00
info Cleanup singleton detection and data retrieval 2020-03-16 12:25:28 -07:00
interlib Resolve the PMIx v3 incompatibility 2020-02-14 21:01:10 -08:00
mca ompi_proc_t size reduction: part 1 2020-03-23 12:49:44 -07:00
message predefined MPI object padding: set to fixed number of bytes (#3634) 2017-06-01 15:28:23 -04:00
mpi Some memchecker cleanup and others. 2020-03-05 16:44:18 -05:00
mpiext Some memchecker cleanup and others. 2020-03-05 16:44:18 -05:00
op Move from the use of regex to compression 2019-02-08 11:11:14 -08:00
patterns Remove ORTE project 2020-02-07 18:20:06 -08:00
peruse mpi/finalized: revamp INITIALIZED/FINALIZED 2018-06-01 13:36:29 -07:00
proc ompi_proc_t size reduction: part 1 2020-03-23 12:49:44 -07:00
request grequestx: fix race condition in initialization 2020-03-20 14:53:28 +01:00
runtime Revert "Ensure we get our local topology" 2020-03-23 11:15:47 -04:00
tools Update PMIx and PRRTE to reduce mpirun complexity 2020-03-20 13:49:12 -07:00
util timings: Fix timings when 'prefix' is used 2020-03-07 09:36:43 -08:00
win ompi: cleanup various string operations 2018-10-14 16:10:20 -07:00
Makefile.am ompi: remove obsolete c++ bindings 2020-02-26 13:04:55 -08:00