Fixmapping VM pages, Vsyscalls

2005-08-15T14:00:00

Faster system calls

The classical Linux system call mechanism is to put the call number in the eax register (in the case of i386) and simply invoke:
int $0x80
Here is a code fragment which measures the time taken to invoke `getpid' 10000000 times:

#define N 10000000
int pid;
main()
{
	int i;

	for(i = 0; i < N; i++) {
		asm("movl $20, %%eax \n"
		    "int $0x80 \n"
		    "movl %%eax, pid \n"
		    :
		    :
		    :"eax");
	}
	printf("got pid = %d, actual pid = %d\n", pid, getpid());
}
On my P4 (HT) 2.8GHz system, the program took about 3.9 seconds to execute. Modern Pentium/AMD processors support instructions like sysenter/syscall using which it is possible to get into kernel mode faster. The problem here is that checking which mechanism (ie, int80/ syscall/sysenter) is supported by the processor as part of the system call invocation will itself incur an unnecessary overhead. What is the solution?

Fixmapping

It's possible to assign hard-coded virtual addresses to physical addresses during system bootup - note that only the virtual address is hard coded, the physical address is determined dynamically. The solution which Linus has implemented is: during bootup, get a free page and map it to virtual address 0xffffe000. Determine what kind of syscall mechanism your CPU supports and simply store a few bytes of machine code at that location; machine code which will trap into the kernel using the fastest available mechanism. Now, the user program can execute a system call by simply jumping to this particular virtual address! Here is a small C program which reads from 0xffffe000 and dumps it to stdout; the output can be redirected to a file and analyzed.

main()
{
	char *s, buf[4096];
	s = (char *)0xffffe000;
	memcpy(buf, s, sizeof(buf));
	write(1, buf, sizeof(buf));
}
We run the program:

./a.out > dat
and do a `file dat'. Here is the output:

dat: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), stripped
The fixmapped page contains an ELF shared object. Let's do:

objdump -d dat
Here is part of the output we get:

Disassembly of section .text:

ffffe400 <.text>:
ffffe400:	51                   	push   %ecx
ffffe401:	52                   	push   %edx
ffffe402:	55                   	push   %ebp
ffffe403:	89 e5                	mov    %esp,%ebp
ffffe405:	0f 34                	sysenter
ffffe407:	90                   	nop
Note that 0xffffe400 is the start of the code sequence which ultimately traps into the kernel by calling `sysenter'.

Is there a speed up?

Let's find out. Here is a test program:

#define N 10000000
int pid;
main()
{
	int i;

	for(i = 0; i < N; i++) {
		asm("movl $20, %%eax \n"
		    "call 0xffffe400 \n"
		    "movl %%eax, pid \n"
		    :
		    :
		    :"eax");
	}
	printf("got pid = %d, actual pid = %d\n", pid, getpid());
}
I am getting a run time of 1.4 seconds (down from 3.9 for the int80 version)!

Fixmap your own Hello,World

Problem: Write a Hello,World printing program which doesn't have the sequence "Hello,World" stored in it. Solution: Let's fixmap a page containing "Hello,World"
  1. Edit include/asm/fixmap.h; just add FIX_HELLO_WORLD below FIX_VSYSCALL and change the macro FIXADDR_USER_END to make it look like (FIXADDR_USER_START + 2*PAGE_SIZE)
  2. Edit arch/i386/kernel/sysenter.c. First, add an intialization:
    
    unsigned long page2 = get_zeroed_page(GFP_ATOMIC);
    Then add the code:
    
    __set_fixmap(FIX_HELLO_WORLD, __pa(page2), PAGE_READONLY_EXEC);
    memcpy((void*)page2, "Hello,World", 12);
    That's all. Recompile the kernel (I am using a kernel.org 2.6.12) and write a user program which copies from virtual address 0xffffd000; you should get your Hello,World.

References

[Subscribe to our Newsletter] [Go to pramode.net home] [Courses at Recursive Labs] [Connect with me on Twitter/Facebook]