c8258c06e2
of individual regions (each region is a multiple of page size in length), and each process claims its own regions by binding it to its local memory. Each process would end up membining something like 16 individual regions in the overall shmem segment. There were two errors in this code relating to the memory affinity pinning. Some combination of these two errors would lead to kernel panics (!) on my RHEL 6.2 x86_64 machines when used with mmap'ed shared memory (not posix or sysv shared memory, curiously enough): 1. The shared memory segment is initially divided into two regions: control and data. The control starts at the beginning of the shmem segment, the data starts after that. The data portion, unfortunately, was ''not'' aligned to a page. So all the multiple-of-page-size regions that we divvy up were also not alined on page boundaries. And therefore all the regions we tried to membind were not on page boundaries. The solution was to ensure that the data portion started on a page boundary. Then all of the individual regions were on page boundaries, too. That being said, in my tests, Linux mbind() fails gracefully when the address is not on a page boundary. So I'm not sure how this worked at all / led to a kernel panic... 2. There was some bad pointer math that resulted in membinding regions larger than they should have been, resulting in region overlaps. There were definitely overlaps between regions in the same process; it's likely that there were overlaps between regions of multiple processes, too -- I'm not sure (and don't care to figure out :-) ). The solution was to fix the pointer math so that each region membinds exactly only itself and no neighboring/overlapping regions. cmr:v1.7.2:reviewer=samuel This commit was SVN r28442. |
||
---|---|---|
.. | ||
base | ||
basic | ||
demo | ||
fca | ||
hierarch | ||
inter | ||
libnbc | ||
ml | ||
self | ||
sm | ||
tuned | ||
coll.h | ||
Makefile.am |