MMAP memory between kernel and userspace

Let kernel allocate memory and let userspace map is sounds like an easy task, and sure it's.

There are just a few things that is good to know about page mapping.

The MMU (Memory Management Unit) contains page tables with entries for mapping between virtual and physical addresses. These pages is the smallest unit that the MMU deals with. The size of a page is given by the PAGE_SIZE macro in asm/page.h ans is typically 4k for most architectures.

There is a few more useful macros in asm/page.h:

PAGE_SHIFT: How many steps we should shift to left to get a PAGE_SIZE

PAGE_SIZE: Size of a page, defined as (1 << PAGE_SHIFT).

PAGE_ALIGN(len): Will round up the length to the closest alignment of PAGE_SIZE.

How does mmap(2) work?

Every page table entry has a bit that tells us if the entry is valid in supervisor mode (kernel mode) only. And sure, all memory allocated in kernel space will have this bit set. What the mmap(2) system call do is simply creating a new page table entry with a different virtual address that points to the same physical memory page. The difference is that this supervisor-bit's not set.

This let userspace access the memory as if it was a part of the application, for now it's! The kernel is not involved in those accesses at all, so it's really fast.

Magic? Kind of.

The magic is called remap_pfn_range().

What remap_pfn_range() do is just essentially to update the process's specific page table with these new entries.

Example, please

Allocate memory

As we already know, the smallest unit that the MMU handle is the size of PAGE_SIZE and the mmap(2) only works with full pages. Even if you just want to share only 100 bytes, a whole page frame will be remapped and must therefor be allocated in the kernel. The allocated memory must also be page aligned.

__get_free_pages()

One way to allocate pages is with __get_free_pages().:

unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)

gft_mask is commonly set to GFP_KERNEL in process/kernel context and GFP_ATOMIC in interrupt context. The order is the number of pages to allocate expressed in 2^order.

For example::
u8 *vbuf = __get_free_pages(GFP_KERNEL, size >> PAGE_SHIFT);

Allocated memory is freed with __free_pages().

vmalloc()

A more common (and preferred) way to allocate virtual continuous memory is with vmalloc(). vmalloc() will allways allocate whole set of pages, no matter what. This is exactly what we want!

Read about vmalloc() in kmalloc(9):

Allocated memory is freed with vfree().

alloc_page()

If you need only one page, alloc_page() will give you that. If this is the case, insead of using remap_pfn_range(), vm_insert_page() will do the work you for you. Notice that vm_insert_page() apparently only works on order-0 (single-page) allocation. So if you want to allocate N pages, you will hace to call vm_insert_page() N times.

Now some code

Allocation

priv->a_size = ATTRIBUTE_N * ATTRIBUTE_SIZE;

/* page align */
priv->a_size = PAGE_ALIGN(priv->a_size);
priv->a_area =vmalloc(priv->a_size);

file_operations.mmap

static int scan_mmap (struct file *file, struct vm_area_struct *vma)
{
struct mmap_priv *priv = file->private_data;
unsigned long start = vma->vm_start;
unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
unsigned long page;
size_t size = vma->vm_end - vma->vm_start;
if (size > priv->a_size)
           return -EINVAL;
page = vmalloc_to_pfn((void *)priv->a_area);
if (remap_pfn_range(vma, start, page, priv->a_size, PAGE_SHARED))
           return -EAGAIN;
vma->vm_flags |= VM_RESERVED; /* avoid to swap out this VMA */
return 0;
}