Contiguous Memory Allocator

Introduction

I do find memory management as one of the most fascinating subsystem in the Linux kernel, and I take every chance I see to talk about it. This post is inspired by a project I'm currently working on; an embedded Linux platform with a camera connected to the CSI-2 bus.

Before we dig into which problems we could trip over, lets talk briefly about how the kernel handles memory.

Memory subsystem

The memory management subsystem handles a wide spectrum of operations which all have impact on the system performance. The subsystem is therefor divided into several parts to sustain operational efficiency and optimized resource handling for different use cases.

Such parts includes:

  • Page allocator
  • Buddy system
  • Kmalloc allocator
  • Slab caches
  • Vmalloc allocator
  • Contiguous memory allocator
  • ...

The smallest allocation unit of memory is a page frame. The Memory Management Unit (MMU) does a terrific job to arrange and map these page frames of the available physical memory into a virtual address space. Most allocations in the kernel are only virtually contiguous which is fine for the most use cases.

Some hardware/IP-blocks requires physically contiguous memory to work though. Direct Memory Access (DMA) transfers are one such case where memory (often) needs to be physically contiguous. Many DMA controllers now supports scatter-gather, which let you hand-pick addresses to make it appear to be contiguous and then let the (IO)MMU do the rest.

To make it works, it requires that the hardware/IP-blocks actually do its memory accesses through the MMU, which is not always the case.

Multimedia devices such as GPU or VPU does often requires huge blocks of physically contiguous memory and do (with exceptions, see Raspberry Pi 4 below) not make use of the (IO)MMU.

Contiguous memory

In order to meet this requirement on big chunks of physically contiguous memory we have to reserve it from the main memory during system boot.

Before CMA, we had to use the mem kernel parameter to limit how much of the system memory that should be available for allocators in the Linux system.

The memory outside this mem-region is not touched by the system and could be remapped into linear address space by the driver.

Here is the documentation for the mem kernel parameter [1]:

mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
                Amount of memory to be used in cases as follows:

                1 for test;
                2 when the kernel is not able to see the whole
                system memory;
                3 memory that lies after 'mem=' boundary is
                excluded from the hypervisor, then
                assigned to KVM guests.
                4 to limit the memory available for kdump kernel.

                [ARC,MICROBLAZE] - the limit applies only to low memory,
                high memory is not affected.

                [ARM64] - only limits memory covered by the linear
                mapping. The NOMAP regions are not affected.

                [X86] Work as limiting max address. Use together
                with memmap= to avoid physical address space collisions.
                Without memmap= PCI devices could be placed at addresses
                belonging to unused RAM.

                Note that this only takes effects during boot time since
                in above case 3, memory may need be hot added after boot
                if system memory of hypervisor is not sufficient.

The mem parameter has a few drawbacks. The driver needs details about where to get the reserved memory and the memory lie momentarily unused when the driver is not initiating any access operations.

Therefor the Contiguous Memory Allocator (CMA) was introduced to manage these reserved memory areas.

The benefits by using CMA is that this area is handled by the allocator algorithms instead of the device driver itself. This let both devices and systems to allocate and use memory from this CMA area through the page allocator for regular needs and through the DMA allocation routines when DMA capabilities is needed.

A few words about Raspberry Pi

Raspberry Pi uses a configuration (config.txt) file that is read by the GPU to initialize the system. The configuration file has many tweakable parameters and one of those are gpu_mem.

This parameter specifies how much memory (in megabytes) to reserve exclusively for the GPU. This works pretty much like the mem kernel commandline parameter described above, with the very same drawbacks. The memory reserved for GPU is not available for the ARM CPU and should be kept as low as possible that your application could work with.

One big difference between the variants of the Raspberry Pi modules is that the Raspberry Pi 4 has a GPU with its own MMU, which allows the GPU to use memory that is dynamically allocated within Linux. The gpu_mem could therfor be kept small on that platform.

The GPU is normally used for displays, 3D calculations, codecs and cameras. One important thing regarding the camera is that the default camera stack (libcamera) does use CMA memory to allocate buffers instead of the reserved GPU memory. In cases that the GPU is only for camera purposes, the gpu_mem could be kept small.

How much CMA is already reserved?

The easiest way to determine how much memory that is reserved for CMA is to consult meminfo:

# grep Cma /proc/meminfo
CmaTotal:         983040 kB
CmaFree:          612068 kB

or look at the boot log:

# dmesg | grep CMA
[    0.000000] Reserved memory: created CMA memory pool at 0x0000000056000000, size 960 MiB

Reserve memory with CMA

/media/reserved.jpg

The CMA area is reserved during boot and there are a few ways to do this.

By device tree

This is the preferred way to define CMA areas.

This example is taken from the device tree bindings documentation [2]:

reserved-memory {
    #address-cells = <1>;
    #size-cells = <1>;
    ranges;

    /* global autoconfigured region for contiguous allocations */
    linux,cma {
        compatible = "shared-dma-pool";
        reusable;
        size = <0x4000000>;
        alignment = <0x2000>;
        linux,cma-default;
    };
};

By kernel command line

The CMA area size could also be specified by the kernel command line. There are tons of references out there that states that the command line parameter is overridden by the device tree, but I thought it sounded weird so I looked it up, and the kernel command line overrides device tree, not the other way around.

At least nowadays:

static int __init rmem_cma_setup(struct reserved_mem *rmem)
{
    ...
    if (size_cmdline != -1 && default_cma) {
        pr_info("Reserved memory: bypass %s node, using cmdline CMA params instead\n",
            rmem->name);
        return -EBUSY;
    }
    ...
}

Here is the documentation for the cma kernel parameter [1]:

cma=nn[MG]@[start[MG][-end[MG]]]
                [KNL,CMA]
                Sets the size of kernel global memory area for
                contiguous memory allocations and optionally the
                placement constraint by the physical address range of
                memory allocations. A value of 0 disables CMA
                altogether. For more information, see
                kernel/dma/contiguous.c

By kernel configuration

The kernel configuration could be used to set min/max and even a percentage of how much of the available memory that should be reserved for the CMA area:

CONFIG_CMA
CONFIG_CMA_AREAS
CONFIG_DMA_CMA
CONFIG_DMA_PERNUMA_CMA
CONFIG_CMA_SIZE_MBYTES
CONFIG_CMA_SIZE_SEL_MBYTES
CONFIG_CMA_SIZE_SEL_PERCENTAGE
CONFIG_CMA_SIZE_SEL_MIN
CONFIG_CMA_SIZE_SEL_MAX
CONFIG_CMA_ALIGNMENT

Conclusion

As soon we are using camera devices with higher resolution and do the image manipulation in the VPU/GPU, we almost always have to increase the CMA area size. Otherwise we will end up with errors like this:

cma_alloc: alloc failed, req-size: 8192 pages, ret: -12