Saturday, June 04, 2011

Page Faults

While going through some code emitted by a Just-In-Time compiler (JIT), I’ve encountered a curious piece of code which suggested that if access to the process' utilisable stack space isn’t done incrementally, it will cause an “access violation”.

Normally, I wouldn’t have bothered with the problem. But in this case, the JIT-ed code makes uninitialised reads to the process stack, causing valgrind to generate a huge amount of spurious warnings in its log. This makes it difficult to sieve through relevant details, and makes it impossible to generate a static suppression for because the JIT-ed code's call frames are dynamically generated and arbitrary in nature.

I didn’t see what’s wrong with touching memory that's already accessible by the application, so I didn’t really grok what the “access violation” exactly implies. A little sleuthing is required, so I wrote a little code to test out the “access violation”:

#.equ INCREMENT, 0x1000
.equ INCREMENT, 0x1800

.section .text

 .string "%d\n"

.globl _start
 push %rbp
 movq %rsp, %rbp

 # loop and keep touching stack space
 movq $640, %rcx
 movq $-0x1000, %rbx
 movq (%rbp, %rbx), %rax
 subq $INCREMENT, %rbx

 movq %rbx, %r12   # use callee-save to prevent push to stack   
 movq %rbx, %rsi
 movq $FORMAT_STR, %rdi
 movq $0, %rax
 call printf       # but call pushes to stack too :(
 movq %r12, %rbx
 loop again

 movq $1, %rbx
 movq $1, %rax
 int $0x80

Note the commented code at the first line; this represents the original page size boundary in which the JIT emitted that’s causing the offending uninitialised memory access; if the size isn’t extended, the code will segmentation fault at around the 8MB mark. The corresponds to what ‘ulimit -s’ reports on the OS.

However, if the size gets incremented to 0x1800 bytes for example, the code will segmentation fault way much earlier, at around the 140k mark, which puzzled me. Looking at ‘dmesg’ shows something interesting:

[806331.042666] incremental[28910]: segfault at 7fff983ad8a8 ip 0000000000400256 sp 00007fff983cf8a8 error 4 in incremental[400000+1000]

I’m surprised that the kernel actually reports this error, so I started searching for the error string on Google code search. The likely matches came from cygwin which does mention “access violation” and xen-source where it indicates a page fault.

Reading through the definition suggests that I’m causing a hard page fault, but I wanted to make sure that the error code 4 is exactly meaning this. Some cursory research led me to a pretty helpful CS page explaining page faults, with excerpts from Linux 1.0’s sources; scanning through it which showed that "error 4" means that the error comes from user space (as opposed to kernel space).

The code also indicates that if an allocation exceeds the OS page size, the kernel is free to abort the program, which explains the error. Further research also led me to the getpagesize() system call, which verified that the page size for Linux is set at 4KB.

So mystery solved. I suppose the next thing I can do, is to make a nasty hack in the JIT to make spurious writes instead of reads instead; that should get rid of all the valgrind false positives, but I can’t say it’s the most elegant way of resolving the issue.