1
1
openmpi/opal/mca/common/sm
Ralph Castain 33ab928e1b ompi_proc_t size reduction: part 1
We currently save the hostname of a proc when we create the ompi_proc_t for it. This was originally done because the only method we had for discovering the host of a proc was to include that info in the modex, and we had to therefore store it somewhere proc-local. Obviously, this ccarried a memory penalty for storing all those strings, and so we added a "cutoff" parameter so that we wouldn't collect hostnames above a certain number of procs.

Unfortunately, this still results in an 8-byte/proc memory cost as we have a char* pointer in the opal_proc_t that is contained in the ompi_proc_t so that we can store the hostname of the other procs if we fall below the cutoff. At scale, this can consume a fair amount of memory.

With the switch to relying on PMIx, there is no longer a need to cache the proc hostnames. Using the "optional" feature of PMIx_Get, we restrict the retrieval to be purely proc-local - i.e., we retrieve the info either via shared memory or from within the proc-internal hash storage (depending upon the active PMIx components). Thus, the retrieval of a hostname is purely a local operation involving no communication.

All RM's are required to provide a complete hostname map of all procs at startup. Thus, we have full access to all hostnames without including them in a modex or having to cache them on each proc. This allows us to remove the char* pointer from the opal_proc_t, saving us 8-bytes/proc.

Unfortunately, PMIx_Get does not currently support the return of a static pointer to memory. Thus, even though PMIx has the hostname in its memory, it can only return a malloc'd version of it. I have therefore ensured that the return from opal_get_proc_hostname is consistently malloc'd and free'd wherever used. This shouldn't be a burden as the hostname is only used in one of two circumstances:

(a) in an error message
(b) in a verbose output for debugging purposes

Thus, there should be no performance penalty associated with the malloc/free requirement. PMIx will eventually be returning static pointers, and so we can eventually simplify this method and return a "const char*" - but as noted, this really isn't an issue even today.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-23 12:49:44 -07:00
..
common_sm_mpool.c ompi_proc_t size reduction: part 1 2020-03-23 12:49:44 -07:00
common_sm_mpool.h opal: rework mpool and rcache frameworks 2016-03-14 10:50:41 -06:00
common_sm.c ompi_proc_t size reduction: part 1 2020-03-23 12:49:44 -07:00
common_sm.h opal: add types for atomic variables 2018-09-14 10:48:55 -06:00
configure.m4 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) 2014-07-26 00:47:28 +00:00
help-mpi-common-sm.txt George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) 2014-07-26 00:47:28 +00:00
Makefile.am opal: rework mpool and rcache frameworks 2016-03-14 10:50:41 -06:00
owner.txt add owner files to opa/ompi/orte mca directories 2015-02-22 15:10:23 -07:00