Write a device driver for Zephyr - Part 1

Write a device driver for Zephyr - Part 1

This is the first post in this series. See also part part2, part3 and part4.

Overview

The first time I came across Zephyr [1] was on Embedded Linux Conference in 2016. Once back from the conference I tried to install it on a Cortex-M EVK board I had on my desk. It did not go smoothly at all. The documentation was not very good back then and I don't think I ever got system up and running. That's where I left it.

Now, seven years Later, I'm going to give it another try. A friend of mine, Benjamin Börjesson, who is an active contributor to the project has inspired me to test it out once again.

So I took whatever I could find at home that could be used for an evaluation. What I found was :

  • A Raspberry Pi Pico [2] to run Zephyr on
  • A Segger J-Link [3] for programming and debugging
  • A Digital-To-Analogue-Converter IC (ltc1665 [4]) that the Zephyr project did not support

Great! Our goal will be to write a driver for the DAC, test it out and contribute to the Zephyr project.

/media/zephyr-logo.png

Zephyr

First a few words about Zephyr itself. Zephyr is a small Real-Time Operating System (RTOS) which became a hosted collaborative project for the Linux Foundation in 2016.

Zephyr targets small and cheap MCU:s with constrained resources rather than those bigger SoCs that usually runs Linux. It supports a wide range of architectures and has a extensive suite of kernel services that you can use in the application.

It offers a kernel with a small footprint and a flexible configuration build system. Every Linux kernel hacker will recognize itself in the filesystem structure, Kconfig and device trees - which felt good to me.

To me, it feels like a more modern and fresh alternative to FreeRTOS [5] which I am quite familiar with already.

Besides, FreeRTOS uses the Hungarian notation [6], and just avoiding that is actually reason enough for me to choose Zephyr over FreeRTOS. I fully agree with the Linux kernel documentation [7]:

Encoding the type of a function into the name (so-called Hungarian` notation) is asinine - the compiler knows the types anyway and can check those, and it only confuses the programmer.

Even if I personally prefer the older version (before our Code-of-Conduct) [8] :

Encoding the type of a function into the name (so-called Hungarian notation) is brain damaged - the compiler knows the types anyway and can check those, and it only confuses the programmer. No wonder MicroSoft makes buggy programs.

Hardware setup

No fancy hardware setup. I did solder the LTC1665 chip on a break-out board and connected everything with jumper cables. The electrical interface for the LTC1665 is SPI.

/media/rpi-ltc1665.jpg

The connection between the Raspberry Pi Pico and the J-Link:

Pin RP Pico Pin J-Link Signal
"DEBUG SWCLKW 9 SWCLK
"DEBUG GND" 4 GND
"3V3" Pad 36 1 VTref

The connection between Raspberry Pi Pico and LTC1665:

Pin RP Pico LTC1665 Signal
"SPI0_RX" Pad 16 DIN Pin 9 SPI_RX
"SPI0_CSN" Pad 17 CS Pin 7 SPI_CS
"SPI0_SCK" Pad 18 SCK pin 8 SPI_SCK
"SPI0_TX" Pad 19 DOUT Pin 10 SPI_TX

Software setup

Install Zephyr

Zephyr does use west [10] for pretty much everything. West is a meta tool used for repository management, building, debugging, deploying.. you name it. It has many similarities with bitbake that you will find in Yocto. I'm more of a "do one thing and do it well"-guy, so these tools (nor west or bitbake) makes a huge impression on me.

West is written in Python, as the nature of Python is as it is, you have to make a virtual environment to make sure that your setup will work for more than a week. Otherwise you will end up in incompatibilities as soon you upgrading some of the python dependencies.

The documentation [9] is actually really good nowadays. Most of these commands are just copy&paste from there.

Create a new virtual environment:

python -m venv ~/zephyrproject/.venv

Activate the virtual environment:

source ~/zephyrproject/.venv/bin/activate

Install west:

pip install west

Get the Zephyr source code:

west init ~/zephyrproject
cd ~/zephyrproject
west update

Export a Zephyr CMake package to allow CMake to automatically load boilerplate code required for building Zephyr applications:

west zephyr-export

The Zephyr project does contain a file with additional Python dependencies, install them:

pip install -r ~/zephyrproject/zephyr/scripts/requirements.txt

Install Zephyr SDK

The Zephyr Software Development Kit (SDK) contain toolchains for all architectures that is supported by Zephyr.

Download the latest SDK bundle:

cd ~
wget https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.16.0/zephyr-sdk-0.16.0_linux-x86_64.tar.xz
wget -O - https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.16.0/sha256.sum | shasum --check --ignore-missing

Extract the archive:

tar xvf zephyr-sdk-0.16.0_linux-x86_64.tar.xz

Run the setup script:

cd zephyr-sdk-0.16.0
./setup.sh

Build OpenOCD

The Raspberry Pi Pico has an SWD interface that can be used to program and debug the on board RP2040 MCU.

This interface can be utilized by OpenOCD. Support for RP2040 is not mainlined though, so we have to go for a rpi fork [11].

Clone repository:

git clone https://github.com/raspberrypi/openocd.git
cd openocd

Build:

./bootstrap
./configure
make

And install:

make install

Build sample application

The Raspberry Pi Pico does have a LED on board. So blinky, an application that will flash the LED with 1Hz, is a good test to prove that at least something is alive. Build it:

cd ~/zephyrproject/zephyr
west build -b rpi_pico samples/basic/blinky -- -DOPENOCD=/usr/local/bin/openocd -DOPENOCD_DEFAULT_PATH=/usr/local/share/openocd/scripts -DRPI_PICO_DEBUG_ADAPTER=jlink

Note that we specify the board (-b) to rpi_pico.

OPENOCD and OPENOCD_DEFAULT_PATH should point to where OpenOCD is installed in the previous step.

Flash the application

To flash our Raspberry Pi Pico, we just run:

west flash

As we have set the RPI_PICO_DEBUG_ADAPTER during the build stage, it is cached so it can be omitted from the west flash and west debug commands. Otherwise we had to provide the --runner option. E.g. :

west flash --runner jlink

You don't have to use a J-link to flash the Raspberry Pi Pico, you could also copy the UF2 file to target. If you power up the Pico with the BOOTSEL button pressed, it will appear on the host as a mass storage device where you could simply copy the UF2 file to. You loose the possibility to debug with GDB though.

Debug the application

The most straight forward way is to use west to start a GDB session (--runner is still cached from the build stage):

west debug

I prefer to use the Text User Interface (TUI) as it is easier to follow the code, both in C and assembler. Enter TUI mode by press CTRL+X+A or enter "tui enable" on the command line.

If you do not want to use west, you could start openocd by yourself:

openocd -f interface/jlink.cfg -c 'transport select swd' -f target/rp2040.cfg -c "adapter speed 2000" -c 'targets rp2040.core0'

And manually connect with GDB:

gdb-multiarch -tui
(gdb) target external :3333
(gdb) file ./build/zephyr/zephyr.elf

The result is the same.

/media/zephyr-gdb.png

Summary

Both the hardware and software environment is now ready to do some real work. In the part2 we will focus on how to integrate the driver into the Zephyr project.

Write a device driver for Zephyr - Part 2

Write a device driver for Zephyr - Part 2

This is the second post in this series. See also part part1, part3 and part4.

Overview

In the first part1 of this series, we did setup the hardware and prepared the software environment. In this part we will focus on pretty much everything but writing the actual driver implementation. We will touch multiple areas in order to fully integrate the driver into the Zephyr project, this includes:

  • Devicetrees
  • The driver
  • KConfig
  • Unit tests

Lets introduce each one of those before we start.

Devicetrees

A Devicetree [2] is a data structure that describe the static hardware configuration in a standard manner. One of the motivations behind devicetree is that it should not be specific for any kernel. In the best of the worlds, you should be able to boot a Linux kernel, BSD kernel or Zephyr (well..) with the same devicetree. I've never heard about a working example IRL though, but the idea is good.

In the same way, you should be able to boot the same kernel on different board by only swap the devicetree. In Zephyr, the devicetree is integrated to the binary blob, so this idea does not fully apply to Zephyr though.

There are two types of files related to device trees in Zephyr:

  • Devicetree sources - the devicetree itself (including dts, interface files and overlays).
  • Devicetree bindings - description of its content. E.g. data types and which properties that is required or optional.

Zephyr does make use of both of these type of files during the build process. It allows the build process to make a build-time validation of the devicetree sources against the bindings, generate KConfig macros and a whole bunch of other macros that is to be used by the application and by Zephyr itself. We will see example of these macros later on.

Here is a simplified picture of the build process with respect to devicetrees:

/media/zephyr-devicetree.png

Driver

All drivers is located in the ./driver directory. It is C-files that contains the actual implementation of the driver.

KConfig

Like the Linux kernel (and U-boot, busybox, Barebox, Buildroot...), Zephyr uses the KConfig system to select what subsystem, libraries and drivers to be included in the build.

Remember when we did build the blinky application in the part1? We did provide -b rpi_pico to the build command to specify board:

west build -b rpi_pico ....

This will load ./boards/arm/rpi_pico/rpi_pico_defconfig as the default configuration and store it into ./build/zephyr/.config, which is the actual configuration the build system will use.

The .config file contains all configuration options selected by e.g. menuconfig AND the generated configuration options from the devicetree.

Unit tests

Zephyr makes use of Twister [1] for unit tests. By default it will build the majority of all tests on a defined set of boards. All these tests is part of the automatic test procedure for every pull request.

Lets start!

First we have to create a few files and integrate them into the build system. The directory hiearchy is similiar to the Linux kernel, lucky for me, it was quite obvious where to put things.

Driver

Create an empty file for now:

touch drivers/dac/dac_ltc166x.c

The driver will support both ltc1660 (10-bit, 8 channels) and ltc1665 (8-bit, 8 channels) DAC. I do not prefer to name drivers with an x as there actually are chips out there with an x in their name, so it could be a little fraudulent. That is at least something we try to avoid it in the Linux kernel.

A better name would be just dac_ltc1660.c and support all ICs that are compatible with dac_ltc1660. However, the Zephyr project has choosen to make use of the x in names to indicate that multiple chips are supported. When in Rome, do as the Romans do.

Add the file to the CMake build system:

diff --git a/drivers/dac/CMakeLists.txt b/drivers/dac/CMakeLists.txt
index b0e86e3bd4..800bc895fd 100644
--- a/drivers/dac/CMakeLists.txt
+++ b/drivers/dac/CMakeLists.txt
@@ -9,6 +9,7 @@ zephyr_library_sources_ifdef(CONFIG_DAC_SAM             dac_sam.c)
 zephyr_library_sources_ifdef(CONFIG_DAC_SAM0           dac_sam0.c)
 zephyr_library_sources_ifdef(CONFIG_DAC_DACX0508       dac_dacx0508.c)
 zephyr_library_sources_ifdef(CONFIG_DAC_DACX3608       dac_dacx3608.c)
+zephyr_library_sources_ifdef(CONFIG_DAC_LTC166X     dac_ltc166x.c)
 zephyr_library_sources_ifdef(CONFIG_DAC_SHELL          dac_shell.c)
 zephyr_library_sources_ifdef(CONFIG_DAC_MCP4725                dac_mcp4725.c)
 zephyr_library_sources_ifdef(CONFIG_DAC_MCP4728                dac_mcp4728.c)

CONFIG_DAC_LTC166X comes from the Kconfig system and could be either 'y' or 'n' dependig on if it is selected or not.

Kconfig

Create two new Kconfig configuration options. One for the driver itself and one for its init priority:

diff --git a/drivers/dac/Kconfig.ltc166x b/drivers/dac/Kconfig.ltc166x
new file mode 100644
index 0000000000..6053bc39bf
--- /dev/null
+++ b/drivers/dac/Kconfig.ltc166x
@@ -0,0 +1,22 @@
+# DAC configuration options
+
+# Copyright (C) 2023 Marcus Folkesson <marcus.folkesson@gmail.com>
+#
+# SPDX-License-Identifier: Apache-2.0
+
+config DAC_LTC166X
+       bool "Linear Technology LTC166X DAC"
+       default y
+       select SPI
    +       depends on DT_HAS_LLTC_LTC1660_ENABLED  || \
+               DT_HAS_LLTC_LTC1665_ENABLED
+       help
+         Enable the driver for the Linear Technology LTC166X DAC
+
+if DAC_LTC166X
+
+config DAC_LTC166X_INIT_PRIORITY
+       int "Init priority"
+       default 80
+       help
+         Linear Technology LTC166X DAC device driver initialization priority.
+
+endif # DAC_LTC166X

DT_HAS_LLTC_LTC1660_ENABLED and DT_HAS_LLTC_LTC1660_ENABLED is configuration options that is generated from the seleted devicetree. By depend on it, the DAC_LTC166X option will only show up if there are such a node specified. I really like this feature.

Also add it into the build stucture:

diff --git a/drivers/dac/Kconfig b/drivers/dac/Kconfig
index 7b54572146..77b0db902b 100644
--- a/drivers/dac/Kconfig
+++ b/drivers/dac/Kconfig
@@ -42,6 +42,8 @@ source "drivers/dac/Kconfig.dacx0508"

 source "drivers/dac/Kconfig.dacx3608"

+source "drivers/dac/Kconfig.ltc166x"
+
 source "drivers/dac/Kconfig.mcp4725"

 source "drivers/dac/Kconfig.mcp4728"

Device tree

The bindings for all devices has to be described in the YAML format. These bindings is verified during compile time in order to make sure that the device tree node fulfills all required properties and not tries to invent some new ones. This protects us against typos, which also is a really good feature. The Linux kernel does not have this...

We have to create such a binding, one for each chip:

diff --git a/dts/bindings/dac/lltc,ltc1660.yaml b/dts/bindings/dac/lltc,ltc1660.yaml
new file mode 100644
index 0000000000..196204236a
--- /dev/null
+++ b/dts/bindings/dac/lltc,ltc1660.yaml
@@ -0,0 +1,8 @@
+# Copyright (C) 2023 Marcus Folkesson <marcus.folkesson@gmail.com>
+# SPDX-License-Identifier: Apache-2.0
+
+include: [dac-controller.yaml, spi-device.yaml]
+
+description: Linear Technology Micropower octal 10-Bit DAC
+
+compatible: "lltc,ltc1660"
diff --git a/dts/bindings/dac/lltc,ltc1665.yaml b/dts/bindings/dac/lltc,ltc1665.yaml
new file mode 100644
index 0000000000..2c789ecc56
--- /dev/null
+++ b/dts/bindings/dac/lltc,ltc1665.yaml
@@ -0,0 +1,8 @@
+# Copyright (C) 2023 Marcus Folkesson <marcus.folkesson@gmail.com>
+# SPDX-License-Identifier: Apache-2.0
+
+include: [dac-controller.yaml, spi-device.yaml]
+
+description: Linear Technology Micropower octal 8-Bit DAC
+
+compatible: "lltc,ltc1665"

dac-controller.yaml and spi-device.yaml is included to inherit some of the required properties (such as spi-max-speed) for this of device.

Unit tests

Add the driver to the test framework and allow the test to be executed on the native_posix platform:

diff --git a/tests/drivers/build_all/dac/testcase.yaml b/tests/drivers/build_all/dac/testcase.yaml
index fa2eb5ac7a..1c7fa521d0 100644
--- a/tests/drivers/build_all/dac/testcase.yaml
+++ b/tests/drivers/build_all/dac/testcase.yaml
@@ -5,7 +5,7 @@ tests:
   drivers.dac.build:
     # will cover I2C, SPI based drivers
     platform_allow: native_posix
-    tags: dac_dacx0508 dac_dacx3608 dac_mcp4725 dac_mcp4728
+    tags: dac_dacx0508 dac_dacx3608 dac_mcp4725 dac_mcp4728 dac_ltc1660 dac_ltc1665
     extra_args: "CONFIG_GPIO=y"
   drivers.dac.mcux.build:
     platform_allow: frdm_k22f

Also add nodes in app.overlay to make it possible for the unit tests to instantiate the DAC:

diff --git a/tests/drivers/build_all/dac/app.overlay b/tests/drivers/build_all/dac/app.overlay
index 471bfae6e8..c1e9146974 100644
--- a/tests/drivers/build_all/dac/app.overlay
+++ b/tests/drivers/build_all/dac/app.overlay
@@ -68,6 +68,8 @@

                        /* one entry for every devices at spi.dtsi */
                        cs-gpios = <&test_gpio 0 0>,
+                                  <&test_gpio 0 0>,
+                                  <&test_gpio 0 0>,
                                   <&test_gpio 0 0>,
                                   <&test_gpio 0 0>;

@@ -118,6 +120,20 @@
                                channel6-gain = <0>;
                                channel7-gain = <0>;
                        };
+
+                       test_spi_ltc1660: ltc1660@3 {
+                               compatible = "lltc,ltc1660";
+                               reg = <0x3>;
+                               spi-max-frequency = <0>;
+                               #io-channel-cells = <1>;
+                       };
+
+                       test_spi_ltc1665: ltc1665@4 {
+                               compatible = "lltc,ltc1665";
+                               reg = <0x4>;
+                               spi-max-frequency = <0>;
+                               #io-channel-cells = <1>;
+                       };
                };
        };
 };

Summary

It are some work that needs to be done to integrate the driver into the Zephyr project. This has to be done for every driver.

In part3 we will start writing the driver code.

Write a device driver for Zephyr - Part 3

Write a device driver for Zephyr - Part 3

This is the third post in this series. See also part part1, part2 and part4.

Overview

In the previous part we prepared Zephyr for our soon to be born driver.

Now we have finally come to the fun point - write the actual driver code!

Driver API

I used to write code for the Linux kernel which is a little bit more complex kernel than Zephyr. The Zephyr driver API for DAC must be one of the most simpliest API:s I have ever seen.

You have to populate just only two functions in the struct dac_driver_api found in inlcude/zephyr/drivers/dac.h:

 * DAC driver API
 *
 * This is the mandatory API any DAC driver needs to expose.
 */
__subsystem struct dac_driver_api {
    dac_api_channel_setup channel_setup;
    dac_api_write_value   write_value;
};

Where channel_setup is used to configure the channel:

/**
 * @brief Configure a DAC channel.
 *
 * It is required to call this function and configure each channel before it is
 * selected for a write request.
 *
 * @param dev          Pointer to the device structure for the driver instance.
 * @param channel_cfg  Channel configuration.
 *
 * @retval 0         On success.
 * @retval -EINVAL   If a parameter with an invalid value has been provided.
 * @retval -ENOTSUP  If the requested resolution is not supported.
 */
typedef int (*dac_api_channel_setup)(const struct device *dev,
             const struct dac_channel_cfg *channel_cfg);

dac_channel_cfg specifies the channel and desired resolution:

/**
 * @struct dac_channel_cfg
 * @brief Structure for specifying the configuration of a DAC channel.
 *
 * @param channel_id Channel identifier of the DAC that should be configured.
 * @param resolution Desired resolution of the DAC (depends on device
 *                   capabilities).
 */
struct dac_channel_cfg {
    uint8_t channel_id;
    uint8_t resolution;
};

Our DAC supports 8 channels and 8bit or 10bit resolution.

write_value is rather self-explained:

/**
 * @brief Write a single value to a DAC channel
 *
 * @param dev         Pointer to the device structure for the driver instance.
 * @param channel     Number of the channel to be used.
 * @param value       Data to be written to DAC output registers.
 *
 * @retval 0        On success.
 * @retval -EINVAL  If a parameter with an invalid value has been provided.
 */
typedef int (*dac_api_write_value)(const struct device *dev,
                                uint8_t channel, uint32_t value);

It writes value to channel on dev.

Device tree

We have to create a device node that represent the DAC in order to make it available in Kconfig. During the build, we specified rpi_pico as board, remember?

west build -b rpi_pico ....

which uses the boards/arm/rpi_pico/rpi_pico.dts device tree. It is possible to add the DAC node directly to rpi_pico.dts, but it is strongly preferred to use overlays.

Device tree overlays

A Device tree overlay is a fragment of a device tree that extends or modifies the existing device tree. As we do not want to add the DAC to all rpi_pico boards, but only to those that actually have it connected, overlays is the way to go.

Device tree overlays can be specified in two ways:

  • DTC_OVERLAY_FILE or
  • .overlay files

The CMake variable DTC_OVERLAY_FILE contains a space- or semicolon-separated list of overlay files that will be used to overlay the device tree.

.overlay files on the other hand, is overlays that the build system automatically will pickup in the following order:

  1. If the file boards/<BOARD>.overlay exists, it will be used.
  2. If the current board has multiple revisions and boards/<BOARD>_<revision>.overlay exists, it will be used. This file will be used in addition to boards/<BOARD>.overlay if both exist.
  3. If one or more files have been found in the previous steps, the build system stops looking and just uses those files.
  4. Otherwise, if <BOARD>.overlay exists, it will be used, and the build system will stop looking for more files.
  5. Otherwise, if app.overlay exists, it will be used.

Our device tree overlay looks as follow:

 &spi0 {
    dac0: dac0@0 {
        compatible = "lltc,ltc1665";
        reg = <0>;
        spi-max-frequency = <1000000>;
        duplex = <0>;
        #io-channel-cells = <8>;
        status = "okay";
    };
};
  • compatible is matching against our driver
  • reg specify chip select 0
  • spi-max-frequency is set to 1MHz
  • duplex specifies duplex mode, 0 equals full duplex
  • status is set to "okay"

Configuration

Once the DAC is added to the device tree, it is time enable the driver in the configuration as well.

Start menuconfig:

west build -t menuconfig

Navigate to:

Device Drivers -> Digital-to-Analog Converters (DAC) drivers -> Linear Technology LTC166X DAC and add support for the driver.

/media/zephyr-menuconfig.png

(What the heck have they done to menuconfig by the way?! It does not behave nor looks like it used to.)

The driver

The chip itself is quite simple and that is reflected in the driver.

Here is the complete driver code:

/*
 * Driver for Linear Technology LTC1660/LTC1665  DAC
 *
 * Copyright (C) 2023 Marcus Folkesson <marcus.folkesson@gmail.com>
 *
 * SPDX-License-Identifier: Apache-2.0
 */

#include <zephyr/kernel.h>
#include <zephyr/drivers/spi.h>
#include <zephyr/drivers/dac.h>
#include <zephyr/logging/log.h>

LOG_MODULE_REGISTER(dac_ltc166x, CONFIG_DAC_LOG_LEVEL);

#define LTC166X_REG_MASK               GENMASK(15, 12)
#define LTC166X_DATA8_MASK             GENMASK(11, 4)
#define LTC166X_DATA10_MASK            GENMASK(12, 2)

struct ltc166x_config {
    struct spi_dt_spec bus;
    uint8_t resolution;
    uint8_t nchannels;
};

static int ltc166x_reg_write(const struct device *dev, uint8_t addr,
            uint32_t data)
{
    const struct ltc166x_config *config = dev->config;
    uint16_t regval;

    regval = FIELD_PREP(LTC166X_REG_MASK, addr);

    if (config->resolution == 10) {
        regval |= FIELD_PREP(LTC166X_DATA10_MASK, data);
    } else {
        regval |= FIELD_PREP(LTC166X_DATA8_MASK, data);
    }

    const struct spi_buf buf = {
            .buf = &regval,
            .len = sizeof(regval),
    };

    struct spi_buf_set tx = {
        .buffers = &buf,
        .count = 1,
    };

    return spi_write_dt(&config->bus, &tx);
}


static int ltc166x_channel_setup(const struct device *dev,
                   const struct dac_channel_cfg *channel_cfg)
{
    const struct ltc166x_config *config = dev->config;

    if (channel_cfg->channel_id > config->nchannels - 1) {
        LOG_ERR("Unsupported channel %d", channel_cfg->channel_id);
        return -ENOTSUP;
    }

    if (channel_cfg->resolution != config->resolution) {
        LOG_ERR("Unsupported resolution %d", channel_cfg->resolution);
        return -ENOTSUP;
    }

    return 0;
}

static int ltc166x_write_value(const struct device *dev, uint8_t channel,
                uint32_t value)
{
    const struct ltc166x_config *config = dev->config;

    if (channel > config->nchannels - 1) {
        LOG_ERR("unsupported channel %d", channel);
        return -ENOTSUP;
    }

    if (value >= (1 << config->resolution)) {
        LOG_ERR("Value %d out of range", value);
        return -EINVAL;
    }

    return ltc166x_reg_write(dev, channel + 1, value);
}

static int ltc166x_init(const struct device *dev)
{
    const struct ltc166x_config *config = dev->config;

    if (!spi_is_ready_dt(&config->bus)) {
        LOG_ERR("SPI bus %s not ready", config->bus.bus->name);
        return -ENODEV;
    }
    return 0;
}

static const struct dac_driver_api ltc166x_driver_api = {
    .channel_setup = ltc166x_channel_setup,
    .write_value = ltc166x_write_value,
};


#define INST_DT_LTC166X(inst, t) DT_INST(inst, lltc_ltc##t)

#define LTC166X_DEVICE(t, n, res, nchan) \
    static const struct ltc166x_config ltc##t##_config_##n = { \
        .bus = SPI_DT_SPEC_GET(INST_DT_LTC166X(n, t), \
            SPI_OP_MODE_MASTER | \
            SPI_WORD_SET(8), 0), \
        .resolution = res, \
        .nchannels = nchan, \
    }; \
    DEVICE_DT_DEFINE(INST_DT_LTC166X(n, t), \
                &ltc166x_init, NULL, \
                NULL, \
                &ltc##t##_config_##n, POST_KERNEL, \
                CONFIG_DAC_LTC166X_INIT_PRIORITY, \
                &ltc166x_driver_api)

/*
 * LTC1660: 10-bit
 */
#define LTC1660_DEVICE(n) LTC166X_DEVICE(1660, n, 10, 8)

/*
 * LTC1665: 8-bit
 */
#define LTC1665_DEVICE(n) LTC166X_DEVICE(1665, n, 8, 8)

#define CALL_WITH_ARG(arg, expr) expr(arg)

#define INST_DT_LTC166X_FOREACH(t, inst_expr) \
    LISTIFY(DT_NUM_INST_STATUS_OKAY(lltc_ltc##t), \
             CALL_WITH_ARG, (), inst_expr)

INST_DT_LTC166X_FOREACH(1660, LTC1660_DEVICE);
INST_DT_LTC166X_FOREACH(1665, LTC1665_DEVICE);

Most of the driver part should be rather self-explained. The driver consists of only four functions:

  • ltc166x_reg_write: write data to actual register.
  • ltc166x_channel_setup: validate channel configuration provided by application.
  • ltc166x_write_vale: validate data from application and then call ltc166x_reg_write.
  • ltc66x_init: make sure that the SPI bus is ready. Used by DEVICE_DT_DEFINE.

The only tricky part is the macro-magic that is used for device registration:

#define INST_DT_LTC166X(inst, t) DT_INST(inst, lltc_ltc##t)

#define LTC166X_DEVICE(t, n, res, nchan) \
    static const struct ltc166x_config ltc##t##_config_##n = { \
        .bus = SPI_DT_SPEC_GET(INST_DT_LTC166X(n, t), \
            SPI_OP_MODE_MASTER | \
            SPI_WORD_SET(8), 0), \
        .resolution = res, \
        .nchannels = nchan, \
    }; \
    DEVICE_DT_DEFINE(INST_DT_LTC166X(n, t), \
                &ltc166x_init, NULL, \
                NULL, \
                &ltc##t##_config_##n, POST_KERNEL, \
                CONFIG_DAC_LTC166X_INIT_PRIORITY, \
                &ltc166x_driver_api)

/*
 * LTC1660: 10-bit
 */
#define LTC1660_DEVICE(n) LTC166X_DEVICE(1660, n, 10, 8)

/*
 * LTC1665: 8-bit
 */
#define LTC1665_DEVICE(n) LTC166X_DEVICE(1665, n, 8, 8)

#define CALL_WITH_ARG(arg, expr) expr(arg)

#define INST_DT_LTC166X_FOREACH(t, inst_expr) \
    LISTIFY(DT_NUM_INST_STATUS_OKAY(lltc_ltc##t), \
             CALL_WITH_ARG, (), inst_expr)

INST_DT_LTC166X_FOREACH(1660, LTC1660_DEVICE);
INST_DT_LTC166X_FOREACH(1665, LTC1665_DEVICE);

Which became even more trickier as I wanted the driver to support both LTC1660 and LTC1665. To give some clarity, this is what happens:

  • INST_DT_LTC166X_FOREACH expands for each node compatible with "lltc,ltc1660" or "lltc,ltc1665" in the devicetree.
  • A struct ltc166x_config will be created for each instance and populated by the arguments provided by LTC1665_DEVICE or LTC1660_DEVICE.
  • The ltc166x_driver_api struct is common for all instances.
  • DEVICE_DT_DEFINE creates a device object and set it up for boot time initialization.

The documentation [1] describe these macros more in depth.

Test of the driver

Zephyr has a lot of sample applicatons. I used samples/drivers/dac/src/main.c to test my driver

/*
 * Copyright (c) 2020 Libre Solar Technologies GmbH
 *
 * SPDX-License-Identifier: Apache-2.0
 */

#include <zephyr/kernel.h>
#include <zephyr/sys/printk.h>
#include <zephyr/drivers/dac.h>

#define ZEPHYR_USER_NODE DT_PATH(zephyr_user)

#if (DT_NODE_HAS_PROP(ZEPHYR_USER_NODE, dac) && \
        DT_NODE_HAS_PROP(ZEPHYR_USER_NODE, dac_channel_id) && \
        DT_NODE_HAS_PROP(ZEPHYR_USER_NODE, dac_resolution))
#define DAC_NODE DT_PHANDLE(ZEPHYR_USER_NODE, dac)
#define DAC_CHANNEL_ID DT_PROP(ZEPHYR_USER_NODE, dac_channel_id)
#define DAC_RESOLUTION DT_PROP(ZEPHYR_USER_NODE, dac_resolution)
#else
#error "Unsupported board: see README and check /zephyr,user node"
#define DAC_NODE DT_INVALID_NODE
#define DAC_CHANNEL_ID 0
#define DAC_RESOLUTION 0
#endif

static const struct device *const dac_dev = DEVICE_DT_GET(DAC_NODE);

static const struct dac_channel_cfg dac_ch_cfg = {
    .channel_id  = DAC_CHANNEL_ID,
    .resolution  = DAC_RESOLUTION
};

void main(void)
{
    if (!device_is_ready(dac_dev)) {
        printk("DAC device %s is not ready\n", dac_dev->name);
        return;
    }

    int ret = dac_channel_setup(dac_dev, &dac_ch_cfg);

    if (ret != 0) {
        printk("Setting up of DAC channel failed with code %d\n", ret);
        return;
    }

    printk("Generating sawtooth signal at DAC channel %d.\n",
        DAC_CHANNEL_ID);
    while (1) {
        /* Number of valid DAC values, e.g. 4096 for 12-bit DAC */
        const int dac_values = 1U << DAC_RESOLUTION;

        /*
         * 1 msec sleep leads to about 4 sec signal period for 12-bit
         * DACs. For DACs with lower resolution, sleep time needs to
         * be increased.
         * Make sure to sleep at least 1 msec even for future 16-bit
         * DACs (lowering signal frequency).
         */
        const int sleep_time = 4096 / dac_values > 0 ?
            4096 / dac_values : 1;

        for (int i = 0; i < dac_values; i++) {
            ret = dac_write_value(dac_dev, DAC_CHANNEL_ID, i);
            if (ret != 0) {
                printk("dac_write_value() failed with code %d\n", ret);
                return;
            }
            k_sleep(K_MSEC(sleep_time));
        }
    }
}

The application generates a saw-tooth signal on DAC_CHANNEL_ID. Here is the result:

/media/zephyr-sawtooth.jpg

Looks great!

Summary

The implementation of the driver was quite straigt forward. The only part I was actually struggle with was the macros. But in fact, most of the problems I had was due to local build caches. The wierd errors I had disappeard when I did rebuild the whole project. Hrmf.

In part4 of this series we will look on howto contribute this driver back to the Zephyr project.

Write a device driver for Zephyr - Part 4

Write a device driver for Zephyr - Part 4

This is the forth post in this series. See also part part1, part2 and part3.

Overview

This is the forth and last part of this series where we will focus on contribute the driver back to the Zephyr project.

Zephyr use Github for hosting the project and all contribution is by Pull Requests. The process is all well documented [1], both on how to contribute but also what the project expect from you as a contributor.

I'm not really a fan of Github. I prefer to send patches by mail and handle all communication that way, but I probably have to realize soon that I'm just getting old and grumpy (needless to say that I prefer IRC over all other chat systems for instant messaging?).

Split up the changes

As we touch multiple areas of the project, we have to break up the changes into multiple commits. This pull request will contain three commits:

Author: Marcus Folkesson <marcus.folkesson@gmail.com>
Date:   Wed Apr 5 14:21:47 2023 +0200

    dts: bindings: dac: add bindings for ltc1660/ltc1665

    Add bindings for LTC1665/LTC1660, which is a 8/10-bit
    Digital-to-Analog Converter with eight individual channels.

    Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>

commit 6dec8308528a6a5fdf123a8bc24e75ba3e0e8cbd
Author: Marcus Folkesson <marcus.folkesson@gmail.com>
Date:   Wed Apr 5 14:18:00 2023 +0200

    tests: build_all: add entries for ltc1660/ltc1665

    Add the new DAC-drivers to the test suite.

    Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>

commit b66b7aade39b79fb3d6194be1b6414491f57a828
Author: Marcus Folkesson <marcus.folkesson@gmail.com>
Date:   Wed Apr 5 14:16:13 2023 +0200

    drivers: dac: add support for ltc1660/ltc1665

    LTC1665/LTC1660 is a 8/10-bit Digital-to-Analog Converter
    (DAC) with eight individual channels.

    Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>

One miss I see right now as I am writing this blog post is the commit order. The device tree bindings and the test suite should swap order as the test depends on the bindings.

However, the PR is already merged.

Requirements on the PR

The Zephyr project has several requirements on each pull request, these are:

  • Each commit in the PR must provide a commit message following the Commit Message Guidelines.
  • All files in the PR must comply with Licensing Requirements.
  • Follow the Zephyr Coding Style and Coding Guidelines.
  • PRs must pass all CI checks. This is a requirement to merge the PR. Contributors may mark a PR as draft and explicitly request reviewers to provide early feedback, even with failing CI checks.
  • When breaking a PR into multiple commits, each commit must build cleanly. The CI system does not enforce this policy, so it is the PR author’s responsibility to verify.
  • When major new functionality is added, tests for the new functionality shall be added to the automated test suite. All API functions should have test cases and there should be tests for the behavior contracts of the API. Maintainers and reviewers have the discretion to determine if the provided tests are sufficient. The examples below demonstrate best practices on how to test APIs effectively.
  • Kernel timer tests provide around 85% test coverage for the kernel timer , measured by lines of code.
  • Emulators for off-chip peripherals are an effective way to test driver APIs. The fuel gauge tests use the smart battery emulator , providing test coverage for the fuel gauge API and the smart battery driver .
  • Code coverage reports for the Zephyr project are available on Codecov.
  • Incompatible changes to APIs must also update the release notes for the next release detailing the change. APIs marked as experimental are excluded from this requirement.
  • Changes to APIs must increment the API version number according to the API version rules.
  • PRs must also satisfy all Merge Criteria before a member of the release engineering team merges the PR into the zephyr tree.

This may look overwelming for some, but lets break down some of the requirements.

Commit message Guidelines

All commits should have the following format:

[area]: [summary of change]

[Commit message body (must be non-empty)]

Signed-off-by: [Your Full Name] <[your.email@address]>

This is more of a common sense rather than something specific for the Zephyr project.

The Signed-off-by: tag should be used for open source licensing reasons. By adding the tag you agree to the Developer Certificate of Origin (DCO) [3]:

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

  1. The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or
  2. The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as Indicated in the file; or
  3. The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.
  4. I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.

License requirements

Zephyr uses the Apache 2.0 license [4] which is a permissive open source license that allows you to freely use, modify, distribute and sell your own produduct that include Apache 2.0 licensed software.

The license is specified by a SPDX tag in the header of each source file. E.g.:

/*
 * Copyright (c) 2020 Libre Solar Technologies GmbH
 *
 * SPDX-License-Identifier: Apache-2.0
 */

Coding style

All projects has its own coding styles guidelines [5]. Read those carefully. The comment I got on my pull request [2] was just regarding the coding style:

/media/zephyr-remarks.jpg

Final words

My initial thought with this blog series was to give Zephyr another chance since my evaluation didn't go well the first time.

Many people and organizations do use open source for several (good) reasons, but too few actually contribute back to the projects they make use of. Sometimes it's the company culture that doesn't encourage or see the value in it, but mostly it's just a matter of insecurity on the part of the individual developer.

Therefore, this series changed the focus from purely evaluating Zephyr to instead focusing on all the steps I took to get my code into a project I'm quite unfamiliar with. I even changed the blog subject from "First look into Zephyr" to "Write a device driver for Zephyr".

Hopefully it helps someone see that it is not impossible to actually join in and contribute.

/media/zephyr-merged.jpg

Encrypted storage on i.MX

Encrypted storage on i.MX

Brief

Many embedded Linux systems does have some kind of sensitive information on a file storage. It could be private keys, passwords or whatever. It is always a risk that this information could be revealed by an unauthorized person that got their physical hands on the device. The only protection against attackers that who simply bypass the system and access the data storage directly is encryption.

Let's say that we encrypt our sensitive data. Where should we then store the decryption key?

We need to store even that sensitive key on a secure place.

i.MX CAAM

Most of the i.MX SoCs has the Cryptographic Accelerator and Assurance Module (CAAM). This includes both the i.MX6 and i.MX8 SoCs series. The only i.MX SoC that I have worked with that does not have the CAAM module is i.MX6ULL, but there could be more.

The CAAM module does have many use cases and one of those is to generate and handle secure keys. Secure keys that we could use to encrypt/decrypt a file, partition or a whole disk.

Device mapper

Device mapper is a framework that adds an extra abstraction layer on block devices that lets you create virtual block devices to offer additional features. Such features could be snapshots, RAID, or as in our case, disc encryption.

As you can see in the picture below, the device mapper is a layer in between the Block layer and the Virtual File System (VFS) layer:

/media/device-mapper.png

The Linux kernel does support a bunch of different mappers. The current kernel (v6.2) does support the following mappers [1]:

  • dm-delay
  • dm-clone
  • dm-crypt
  • dm-dust
  • dm-ebs
  • dm-flakey
  • dm-ima
  • dm-integrity
  • dm-io
  • dm-queue-length
  • dm-raid
  • dm-service-time
  • dm-zoned
  • dm-era
  • dm-linear
  • dm-log-writes
  • dm-stripe
  • dm-switch
  • dm-verity
  • dm-zero

Where dm-crypt [2] is the one we will focus on. One cool feature of device mappers is that those are stackable. You could for example use dm-crypt on top of a dm-raid mapping. How cool isn't that?

DM-Crypt

DM-Crypt is a device mapper implementation that uses the Crypto API [3] to transparently encrypt/decrypt all access to the block device. Once the device is mounted, all users will not even notice that the data read/written to that mount point is encrypted.

Normally you will use cryptsetup [4] or cryptmount [5] as those are the preferred way to handle the dm-crypt layer. For this we will use dmsetup though, which is a very low level (and difficult) tool to use.

CAAM Secure Keys

Now it is time to answer the question in the introduction section;

Let's say that we encrypt our sensitive data. Where should we then store the decryption key?

The CAAM module has a way to handle these keys in a secure way by store the keys in a protected area that is only readable by the CAAM module itself. In other word, it is not even possible to read out the key. Together with dm-crypt, we can create a master key that will never leave this protected area. On each boot, we will generate a derived (session) key that is the key we could use from userspace. These session keys are called black keys.

How to use it?

Installation

We need to build and install keyctl_caam in order to generate black keys and encapsulate it into a black blob. Download the source code:

git clone https://github.com/nxp-imx/keyctl_caam.git
cd keyctl_caam

And build:

CC=aarch64-linux-gnu-gcc make

I build with a external toolchain prefixed with aarch64-linux-gnu-. If you have a Yocto environment, you could use the toolchain from that SDK instead by use the environment setup script, e.g.:

./environment-setup-aarch64-poky-linux
make

You also have to make sure that the following kernel configurations is enabled:

CONFIG_BLK_DEV_DM=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD=y
CONFIG_DM_CRYPT=y
CONFIG_DM_MULTIPATH=y
CONFIG_CRYPTO_DEV_FSL_CAAM_TK_API=y

Usage

Create a black key from random data, use ECB encryption:

caam-keygen create randomkey ecb -s 16

The file is written to the /data/caam/ folder unless the application is built to use another location (specified with KEYBLOB_LOCATION). Two files should now been generated:

ls -l /data/caam/
total 8
-rw-r--r-- 1 root root 36 apr 3 21.09 randomkey
-rw-r--r-- 1 root root 96 apr 3 21.09 randomkey.bb

Add the generated black key to the kernel key retention service. To this we use the keyctl command:

cat /data/caam/randomkey | keyctl padd logon logkey: @s

Create a deivce-mapper device named $ENCRYPTED_LABEL and map it to the block device $DEVICE:

dmsetup -v create $ENCRYPTED_LABEL --table "0 $(blockdev --getsz $DEVICE) crypt capi:tk(cbc(aes))-plain :36:logon:logkey: 0 $DEVICE 0 1 sector_size:512"

Create a filesystem on our newly created mapper device:

mkfs.ext4 -L $VOLUME_LABEL /dev/mapper/$ENCRYPTED_LABEL

Mount it on $MOUNT_POINT:

mount /dev/mapper/$ENCRYPTED_LABEL ${MOUNT_POINT}

Congrats! Your encrypted device is now ready to use! All data written to $MOUNT_POINT will be encrypted on the fly and decrypted upon read.

To illustrate this, create a file on the encrypted volume:

echo "Encrypted data" > ${MOUNT_POINT}/encrypted-file

Clean up and reboot:

umount $MOUNT_POINT
dmsetup remove $ENCRYPTED_LABEL
keyctl clear @s
reboot

A new session key will be generated upon each cold boot. So we have to import the key from the blob and add it to the key retention service. We also have to create the device mapper. This has to be done at each boot:

caam-keygen import $KEYPATH/$KEYNAME.bb $IMPORTKEY
cat $IMPORTKEYPATH/$IMPORTKEY | keyctl padd logon logkey: @s
dmsetup -v create $ENCRYPTED_LABEL --table "0 $(blockdev --getsz $DEVICE) crypt capi:tk(cbc(aes))-plain :36:logon:logkey: 0 $DEVICE 0 1 sector_size:512"
mount /dev/mapper/$ENCRYPTED_LABEL ${MOUNT_POINT}

We will now be able read back the data from the encrypted device:

cat ${MOUNT_POINT}/encrypted-file
Encrypted data

That was it!

Conclusion

Encryption could be hard, but the CAAM module makes it pretty much straight forward. It protect your secrets from physical attacks, which could be hard to protect otherwise.

However, keep in mind that as soon as the encrypted device is mounted and available to the system, it is free to read for any intruder that have access to the system.

The device security chain is no stronger than its weakest link and you have to identify and handle all potential security risks. This is only one.

Bug in the iMX8MP ECSPI module?

Bug in the iMX8MP ECSPI module?

Background

I do have a system where I can swap between iMX8M Mini and iMX8M Plus CPU modules on the same carrier board.

I did write a a SPI driver for a device on the carrier board. The device is connected to the ECSPI1 (the CPU contains several ECSPI modules) and use the hardware chipselect 0 (SS0). The driver has been used with the iMX8MM CPU module for a while, but as soon I swapped to the iMX8MP it certainly stopped working.

Both iMX8MM and iMX8MP have the same ECSPI IP block that is managed by the spi-imx [1] Linux kernel driver, the application and root filesystem is the same as well.

Same driver, same application, different module. What is happening?

The driver layer also did not report anything suspicious, all SPI transactions contained the data I expected and was successfully sent out on the bus. After debugging the application, driver and devicetree for a while, I took a closer look on the actual SPI signals.

SPI signals

I'm not going to describe the SPI interface specifications, please see Wikipedia [2] or such for more details.

It turns out that the chip select goes inactive after each sent byte, which is a weird behavior. The chipselect should stay low during the whole data transaction.

Here is the signals of one transaction of two bytes:

/media/imx8mp-spi-ss0.jpg

The ECSPI modules supports dynamic burst size, so I was experimenting with that without any success.

Workaround

The best workaround I came up with was to MUX the chipselect pin to the GPIO function instead of SS0 and map that GPIO as chipselect to ECSPI1 by override the affected properties in the device tree file:

&ecspi1 {
          cs-gpios =
                      <&gpio5 9 GPIO_ACTIVE_LOW>,
                      <&gpio2 8 GPIO_ACTIVE_LOW>;
};

&pinctrl_ecspi1_cs0 {
        fsl,pins = <
                MX8MP_IOMUXC_ECSPI1_SS0__GPIO5_IO09         0x40000
                    >;
};

Then the signals looks better:

/media/imx8mp-spi-gpio.jpg

Conclusion

I do not know if all ECSPI modules with all HW chipselects is affected or only SS0 @ ECSPI1. I could not find anything about it in the iMX8MP Errata.

The fact that the workaround did work makes me suspect a hardware bug in the iMX8MP processor. I guess we will see if it shows up in the errata later on.

Capture a picture with V4L2

Capture a picture with V4L2

Brief

As we has seen before, cameras in Linux could be a complex [1] story and you have to watch every step you take to get it right. libcamera [2] does a great job to simplify this in an platform independent way and should be used whenever it is possible.

But not all cameras have a complex flow-chart. Some cameras (e.g. web cameras) are "self-contained" where the image data goes straight from the camera to the user application, without any detours through different IP blocks for image processing on its way.

/media/camera-sketch.png

The V4L2 framework is perfectly suited to those simple cameras.

When I searched around for a simple example application that explained the necessary steps to capture images from a camera, I simple could not find what I was looking for. This is my attempt to provide what I failed to find.

V4L2 user space API

Video devices is represented by character devices in a Linux system. The devices shows up as /dev/video* and supports the following operations:

  • open() - Open a video device
  • close() - Close a video device
  • ioctl() - Send ioctl commands to the device
  • mmap() - Map memory to a driver allocated buffer
  • read() - Read from video device
  • write() - Write to the device

The V4L2 API basically relies on a very large set of IOCTL commands to configure properties and behavior of the camera. The whole API is available from the following header:

#include <linux/videodev2.h>

Here is a list of the most common IOCTL commands:

  • VIDIOC_QUERYCAP - Query a list of the supported capabilities. Always query the capabilities to ensure that the camera supports the buffer mode you intend to use.
  • VIDIOC_ENUM_FMT - Enumerate supported image formats.
  • VIDIOC_G_FMT - Get the current image format.
  • VIDIOC_S_FMT - Set a new image format.
  • VIDIOC_REQBUFS - Request a number of buffers that can later be memory mapped by the user application. The application should always check the actual number that are granted as the driver may allocate mor or less than the requested.
  • VIDIOC_QUERYBUF - Get buffer information for those buffers earlier requested by VIDIOC_REQBUFS. The information could then be passed to the mmap() system call in order to map that buffer to user space.
  • VIDIOC_QBUF - Queue one of the requested buffers to make it available for the driver to fill with image data. Once the buffer is filled, it is no longer available for new data and should be dequeued by the user.
  • VIDEOC_DQBUF - Dequeue a filled buffer. The command will block if no buffer is available unless O_NONBLOCK was passed to open().
  • VIDIOC_STREAMON - Turn on streaming. Queued buffers will be filled as soon data is available.
  • VIDIOC_STREAMOFF - Turn off streaming. This command also flushes the buffer queue.

Buffer management

The V4L2 core maintain two buffer queues internally; one queue (referred to as IN) for incoming (camera->driver) image data and one (referred to as OUT) for outgoing (driver->user) image data.

Buffers are put into the IN queue via the VIDIOC_QBUF command. Once the buffer is filled, the buffer is dequeued from IN and put into the OUT queue, which where the data is available for to the user.

Whenever the user want to dequeue a buffer with VIDIOC_DQBUF, and a buffer is available, it is taken from the OUT queue and pushed to the user application. If no buffer is available the dequeue operation will wait until a buffer is filled and available unless the file descriptor is opened with O_NONBLOCK.

Video data can be pushed to userspace in a few different ways:

  • Read I/O - simply perform a read() operation and do not mess with buffers
  • User pointer - The user application allocates buffers and provide to driver
  • DMA buf - Mostly used for mem2mem devices
  • mmap - Let driver allocate buffers and mmap(2) these to userspace.

This post will *only* focus on mmap:ed buffers!

Typical workflow

We will follow these steps in order to acquire frames from the camera:

/media/v4l2-workflow.png

Query capabilities

VIDIOC_QUERYCAP is used to query the supported capabilities. What is most interesting is to verify that it supports the mode (V4L2_CAP_STREAMING) we want to work with. It is also a good manners to verify that it actually is a capture device (V4L2_CAP_VIDEO_CAPTURE) we have opened and nothing else.

The V4L2 API uses a struct v4l2_capability that is passed to the IOCTL. This structure is defined as follows:

/**
  * struct v4l2_capability - Describes V4L2 device caps returned by VIDIOC_QUERYCAP
  *
  * @driver:           name of the driver module (e.g. "bttv")
  * @card:     name of the card (e.g. "Hauppauge WinTV")
  * @bus_info:         name of the bus (e.g. "PCI:" + pci_name(pci_dev) )
  * @version:          KERNEL_VERSION
  * @capabilities: capabilities of the physical device as a whole
  * @device_caps:  capabilities accessed via this particular device (node)
  * @reserved:         reserved fields for future extensions
  */
struct v4l2_capability {
    __u8    driver[16];
    __u8    card[32];
    __u8    bus_info[32];
    __u32   version;
    __u32   capabilities;
    __u32   device_caps;
    __u32   reserved[3];
};

The v4l2_capability.capabilities field is decoded as follows:

/* Values for 'capabilities' field */
#define V4L2_CAP_VIDEO_CAPTURE              0x00000001  /* Is a video capture device */
#define V4L2_CAP_VIDEO_OUTPUT               0x00000002  /* Is a video output device */
#define V4L2_CAP_VIDEO_OVERLAY              0x00000004  /* Can do video overlay */
#define V4L2_CAP_VBI_CAPTURE                0x00000010  /* Is a raw VBI capture device */
#define V4L2_CAP_VBI_OUTPUT         0x00000020  /* Is a raw VBI output device */
#define V4L2_CAP_SLICED_VBI_CAPTURE 0x00000040  /* Is a sliced VBI capture device */
#define V4L2_CAP_SLICED_VBI_OUTPUT  0x00000080  /* Is a sliced VBI output device */
#define V4L2_CAP_RDS_CAPTURE                0x00000100  /* RDS data capture */
#define V4L2_CAP_VIDEO_OUTPUT_OVERLAY       0x00000200  /* Can do video output overlay */
#define V4L2_CAP_HW_FREQ_SEEK               0x00000400  /* Can do hardware frequency seek  */
#define V4L2_CAP_RDS_OUTPUT         0x00000800  /* Is an RDS encoder */

/* Is a video capture device that supports multiplanar formats */
#define V4L2_CAP_VIDEO_CAPTURE_MPLANE       0x00001000
/* Is a video output device that supports multiplanar formats */
#define V4L2_CAP_VIDEO_OUTPUT_MPLANE        0x00002000
/* Is a video mem-to-mem device that supports multiplanar formats */
#define V4L2_CAP_VIDEO_M2M_MPLANE   0x00004000
/* Is a video mem-to-mem device */
#define V4L2_CAP_VIDEO_M2M          0x00008000

#define V4L2_CAP_TUNER                      0x00010000  /* has a tuner */
#define V4L2_CAP_AUDIO                      0x00020000  /* has audio support */
#define V4L2_CAP_RADIO                      0x00040000  /* is a radio device */
#define V4L2_CAP_MODULATOR          0x00080000  /* has a modulator */

#define V4L2_CAP_SDR_CAPTURE                0x00100000  /* Is a SDR capture device */
#define V4L2_CAP_EXT_PIX_FORMAT             0x00200000  /* Supports the extended pixel format */
#define V4L2_CAP_SDR_OUTPUT         0x00400000  /* Is a SDR output device */
#define V4L2_CAP_META_CAPTURE               0x00800000  /* Is a metadata capture device */

#define V4L2_CAP_READWRITE              0x01000000  /* read/write systemcalls */
#define V4L2_CAP_STREAMING              0x04000000  /* streaming I/O ioctls */
#define V4L2_CAP_META_OUTPUT                0x08000000  /* Is a metadata output device */

#define V4L2_CAP_TOUCH                  0x10000000  /* Is a touch device */

#define V4L2_CAP_IO_MC                      0x20000000  /* Is input/output controlled by the media controller */

#define V4L2_CAP_DEVICE_CAPS            0x80000000  /* sets device capabilities field */

Example code on how to use VIDIOC_QUERYCAP:

void query_capabilites(int fd)
{
    struct v4l2_capability cap;

    if (-1 == ioctl(fd, VIDIOC_QUERYCAP, &cap)) {
        perror("Query capabilites");
        exit(EXIT_FAILURE);
    }

    if (!(cap.capabilities & V4L2_CAP_VIDEO_CAPTURE)) {
        fprintf(stderr, "Device is no video capture device\\n");
        exit(EXIT_FAILURE);
    }

    if (!(cap.capabilities & V4L2_CAP_READWRITE)) {
        fprintf(stderr, "Device does not support read i/o\\n");
    }

    if (!(cap.capabilities & V4L2_CAP_STREAMING)) {
        fprintf(stderr, "Devices does not support streaming i/o\\n");
    }
}

Capabilities could also be read out with v4l2-ctl:

marcus@goliat:~$ v4l2-ctl -d /dev/video4  --info
Driver Info:
    Driver name      : uvcvideo
    Card type        : USB 2.0 Camera: USB Camera
    Bus info         : usb-0000:00:14.0-8.3.1.1
    Driver version   : 6.0.8
    Capabilities     : 0x84a00001
        Video Capture
        Metadata Capture
        Streaming
        Extended Pix Format
        Device Capabilities
    Device Caps      : 0x04200001
        Video Capture
        Streaming
        Extended Pix Format

Set format

The next step after we know for sure that the device is a capture device and supports the certain mode we want to use, is to setup the video format. The application could otherwise receive video frames in a format that it could not deal with.

Supported formats can be quarried with VIDIOC_ENUM_FMT and the current video format can be read out with VIDIOC_G_FMT.

Current format could be fetched by v4l2-ctl:

marcus@goliat:~$ v4l2-ctl -d /dev/video4  --get-fmt-video
Format Video Capture:
    Width/Height      : 320/240
    Pixel Format      : 'YUYV' (YUYV 4:2:2)
    Field             : None
    Bytes per Line    : 640
    Size Image        : 153600
    Colorspace        : sRGB
    Transfer Function : Rec. 709
    YCbCr/HSV Encoding: ITU-R 601
    Quantization      : Default (maps to Limited Range)
    Flags             :

The v4l2_format struct is defined as follows:

/**
 * struct v4l2_format - stream data format
 * @type:   enum v4l2_buf_type; type of the data stream
 * @pix:    definition of an image format
 * @pix_mp: definition of a multiplanar image format
 * @win:    definition of an overlaid image
 * @vbi:    raw VBI capture or output parameters
 * @sliced: sliced VBI capture or output parameters
 * @raw_data:       placeholder for future extensions and custom formats
 * @fmt:    union of @pix, @pix_mp, @win, @vbi, @sliced, @sdr, @meta
 *          and @raw_data
 */
struct v4l2_format {
    __u32    type;
    union {
        struct v4l2_pix_format              pix;     /* V4L2_BUF_TYPE_VIDEO_CAPTURE */
        struct v4l2_pix_format_mplane       pix_mp;  /* V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE */
        struct v4l2_window          win;     /* V4L2_BUF_TYPE_VIDEO_OVERLAY */
        struct v4l2_vbi_format              vbi;     /* V4L2_BUF_TYPE_VBI_CAPTURE */
        struct v4l2_sliced_vbi_format       sliced;  /* V4L2_BUF_TYPE_SLICED_VBI_CAPTURE */
        struct v4l2_sdr_format              sdr;     /* V4L2_BUF_TYPE_SDR_CAPTURE */
        struct v4l2_meta_format             meta;    /* V4L2_BUF_TYPE_META_CAPTURE */
        __u8        raw_data[200];                   /* user-defined */
    } fmt;
};

To set you have to set the v4l2_format.type field to the relevant format.

Example code on how to use VIDIOC_S_FMT:

int set_format(int fd) {
    struct v4l2_format format = {0};
    format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    format.fmt.pix.width = 320;
    format.fmt.pix.height = 240;
    format.fmt.pix.pixelformat = V4L2_PIX_FMT_YUYV;
    format.fmt.pix.field = V4L2_FIELD_NONE;
    int res = ioctl(fd, VIDIOC_S_FMT, &format);
    if(res == -1) {
        perror("Could not set format");
        exit(1);
    }
    return res;
}

Request buffers

Next step once we are done with the format preparations we should allocate buffers to have somewhere to store the images.

This is exactly what VIDIOC_REQBUFS ioctl does for you. The command does take a struct v4l2_requestbuffers as argument:

struct v4l2_requestbuffers {
    __u32                   count;
    __u32                   type;           /* enum v4l2_buf_type */
    __u32                   memory;         /* enum v4l2_memory */
    __u32                   capabilities;
    __u8                    flags;
    __u8                    reserved[3];
};

Some of these fields must be populated before we can use it:

  • v4l2_requestbuffers.count - Should be set to the number of memory buffers that should be allocated. It is important to set a number high enough so that frames won't be dropped due to lack of queued buffers. The driver is the one who decides what the minimum number is. The application should always check the return value of this field as the driver could grant a bigger number of buffers than then application actually requested.
  • v4l2_requestbuffers.type - As we are going to use a camera device, set this to V4L2_BUF_TYPE_VIDEO_CAPTURE.
  • v4l2_requestbuffers.memory - Set the streaming method. Available values are V4L2_MEMORY_MMAP, V4L2_MEMORY_USERPTR and V4L2_MEMORY_DMABUF.

Example code on how to use VIDIOC_REQBUF:

int request_buffer(int fd, int count) {
    struct v4l2_requestbuffers req = {0};
    req.count = count;
    req.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    req.memory = V4L2_MEMORY_MMAP;
    if (-1 == ioctl(fd, VIDIOC_REQBUFS, &req))
    {
        perror("Requesting Buffer");
        exit(1);
    }
    return req.count;
}

Query buffer

After the buffers are allocated by the kernel, we have to query the physical address of each allocated buffer in order to mmap() those.

The VIDIOC_QUERYBUF ioctl works with the struct v4l2_buffer:

/**
 * struct v4l2_buffer - video buffer info
 * @index:  id number of the buffer
 * @type:   enum v4l2_buf_type; buffer type (type == *_MPLANE for
 *          multiplanar buffers);
 * @bytesused:      number of bytes occupied by data in the buffer (payload);
 *          unused (set to 0) for multiplanar buffers
 * @flags:  buffer informational flags
 * @field:  enum v4l2_field; field order of the image in the buffer
 * @timestamp:      frame timestamp
 * @timecode:       frame timecode
 * @sequence:       sequence count of this frame
 * @memory: enum v4l2_memory; the method, in which the actual video data is
 *          passed
 * @offset: for non-multiplanar buffers with memory == V4L2_MEMORY_MMAP;
 *          offset from the start of the device memory for this plane,
 *          (or a "cookie" that should be passed to mmap() as offset)
 * @userptr:        for non-multiplanar buffers with memory == V4L2_MEMORY_USERPTR;
 *          a userspace pointer pointing to this buffer
 * @fd:             for non-multiplanar buffers with memory == V4L2_MEMORY_DMABUF;
 *          a userspace file descriptor associated with this buffer
 * @planes: for multiplanar buffers; userspace pointer to the array of plane
 *          info structs for this buffer
 * @m:              union of @offset, @userptr, @planes and @fd
 * @length: size in bytes of the buffer (NOT its payload) for single-plane
 *          buffers (when type != *_MPLANE); number of elements in the
 *          planes array for multi-plane buffers
 * @reserved2:      drivers and applications must zero this field
 * @request_fd: fd of the request that this buffer should use
 * @reserved:       for backwards compatibility with applications that do not know
 *          about @request_fd
 *
 * Contains data exchanged by application and driver using one of the Streaming
 * I/O methods.
 */
struct v4l2_buffer {
    __u32                   index;
    __u32                   type;
    __u32                   bytesused;
    __u32                   flags;
    __u32                   field;
    struct timeval          timestamp;
    struct v4l2_timecode    timecode;
    __u32                   sequence;

    /* memory location */
    __u32                   memory;
    union {
        __u32           offset;
        unsigned long   userptr;
        struct v4l2_plane *planes;
        __s32               fd;
    } m;
    __u32                   length;
    __u32                   reserved2;
    union {
        __s32               request_fd;
        __u32               reserved;
    };
};

The structure contains a lot of fields, but in our mmap() example, we only need to fill out a few:

  • v4l2_buffer.type - Buffer type, we use V4L2_BUF_TYPE_VIDEO_CAPTURE.
  • v4l2_buffer.memory - Memory method, still go for V4L2_MEMORY_MMAP.
  • v4l2_buffer.index - As we probably have requested multiple buffers and want to mmap each of them we have to distinguish the buffers somehow. The index field is buffer id reaching from 0 to v4l2_requestbuffers.count.

Example code on how to use VIDIOC_QUERYBUF:

int query_buffer(int fd, int index, unsigned char **buffer) {
    struct v4l2_buffer buf = {0};
    buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    buf.memory = V4L2_MEMORY_MMAP;
    buf.index = index;
    int res = ioctl(fd, VIDIOC_QUERYBUF, &buf);
    if(res == -1) {
        perror("Could not query buffer");
        return 2;
    }


    *buffer = (u_int8_t*)mmap (NULL, buf.length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, buf.m.offset);
    return buf.length;
}

Queue buffers

Before the buffers can be filled with data, the buffers has to be enqueued. Enqueued buffers will lock the memory pages used so that those cannot be swapped out during usage. The buffers remain locked until that are dequeued, the device is closed or streaming is turned off.

VIDIOC_QBUF takes the same argument as VIDIOC_QUERYBUF and has to be populated the same way.

Example code on how to use VIDIOC_QBUF:

int queue_buffer(int fd, int index) {
    struct v4l2_buffer bufd = {0};
    bufd.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    bufd.memory = V4L2_MEMORY_MMAP;
    bufd.index = index;
    if(-1 == ioctl(fd, VIDIOC_QBUF, &bufd))
    {
        perror("Queue Buffer");
        return 1;
    }
    return bufd.bytesused;
}

Start stream

Finally all preparations is done and we are up to start the stream! VIDIOC_STREAMON is basically informing the v4l layer that it can start acquire video frames and use the queued buffers to store them.

Example code on how to use VIDIOC_STREAMON:

int start_streaming(int fd) {
    unsigned int type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    if(ioctl(fd, VIDIOC_STREAMON, &type) == -1){
        perror("VIDIOC_STREAMON");
        exit(1);
    }
}

Dequeue buffer

Once buffers are filled with video data, those are ready to be dequeued and consumed by the application. This ioctl will be blocking (unless O_NONBLOCK is used) until a buffer is available.

As soon the buffer is dequeued and processed, the application has to immediately queue back the buffer so that the driver layer can fill it with new frames. This is usually part of the application main-loop.

VIDIOC_DQBUF works similar to VIDIOC_QBUF but it populates the v4l2_buffer.index field with the index number of the buffer that has been dequeued.

Example code on how to use VIDIOC_DQBUF:

int dequeue_buffer(int fd) {
    struct v4l2_buffer bufd = {0};
    bufd.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    bufd.memory = V4L2_MEMORY_MMAP;
    if(-1 == ioctl(fd, VIDIOC_DQBUF, &bufd))
    {
        perror("DeQueue Buffer");
        return 1;
    }
    return bufd.index;
}

Stop stream

Once we are done with the video capturing, we can stop the streaming. This will unlock all enqueued buffers and stop capture frames.

Example code on how to use VIDIOC_STREAMOFF:

int stop_streaming(int fd) {
    unsigned int type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    if(ioctl(fd, VIDIOC_STREAMOFF, &type) == -1){
        perror("VIDIOC_STREAMON");
        exit(1);
    }
}

Full example

It is not the most beautiful example, but it is at least something to work with.

#include <stdio.h>
#include <stdlib.h>

#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <linux/videodev2.h>

#define NBUF 3

void query_capabilites(int fd)
{
    struct v4l2_capability cap;

    if (-1 == ioctl(fd, VIDIOC_QUERYCAP, &cap)) {
        perror("Query capabilites");
        exit(EXIT_FAILURE);
    }

    if (!(cap.capabilities & V4L2_CAP_VIDEO_CAPTURE)) {
        fprintf(stderr, "Device is no video capture device\\n");
        exit(EXIT_FAILURE);
    }

    if (!(cap.capabilities & V4L2_CAP_READWRITE)) {
        fprintf(stderr, "Device does not support read i/o\\n");
    }

    if (!(cap.capabilities & V4L2_CAP_STREAMING)) {
        fprintf(stderr, "Devices does not support streaming i/o\\n");
        exit(EXIT_FAILURE);
    }
}

int queue_buffer(int fd, int index) {
    struct v4l2_buffer bufd = {0};
    bufd.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    bufd.memory = V4L2_MEMORY_MMAP;
    bufd.index = index;
    if(-1 == ioctl(fd, VIDIOC_QBUF, &bufd))
    {
        perror("Queue Buffer");
        return 1;
    }
    return bufd.bytesused;
}
int dequeue_buffer(int fd) {
    struct v4l2_buffer bufd = {0};
    bufd.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    bufd.memory = V4L2_MEMORY_MMAP;
    bufd.index = 0;
    if(-1 == ioctl(fd, VIDIOC_DQBUF, &bufd))
    {
        perror("DeQueue Buffer");
        return 1;
    }
    return bufd.index;
}


int start_streaming(int fd) {
    unsigned int type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    if(ioctl(fd, VIDIOC_STREAMON, &type) == -1){
        perror("VIDIOC_STREAMON");
        exit(EXIT_FAILURE);
    }
}

int stop_streaming(int fd) {
    unsigned int type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    if(ioctl(fd, VIDIOC_STREAMOFF, &type) == -1){
        perror("VIDIOC_STREAMON");
        exit(EXIT_FAILURE);
    }
}

int query_buffer(int fd, int index, unsigned char **buffer) {
    struct v4l2_buffer buf = {0};
    buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    buf.memory = V4L2_MEMORY_MMAP;
    buf.index = index;
    int res = ioctl(fd, VIDIOC_QUERYBUF, &buf);
    if(res == -1) {
        perror("Could not query buffer");
        return 2;
    }


    *buffer = (u_int8_t*)mmap (NULL, buf.length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, buf.m.offset);
    return buf.length;
}

int request_buffer(int fd, int count) {
    struct v4l2_requestbuffers req = {0};
    req.count = count;
    req.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    req.memory = V4L2_MEMORY_MMAP;
    if (-1 == ioctl(fd, VIDIOC_REQBUFS, &req))
    {
        perror("Requesting Buffer");
        exit(EXIT_FAILURE);
    }
    return req.count;
}

int set_format(int fd) {
    struct v4l2_format format = {0};
    format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    format.fmt.pix.width = 320;
    format.fmt.pix.height = 240;
    format.fmt.pix.pixelformat = V4L2_PIX_FMT_YUYV;
    format.fmt.pix.field = V4L2_FIELD_NONE;
    int res = ioctl(fd, VIDIOC_S_FMT, &format);
    if(res == -1) {
        perror("Could not set format");
        exit(EXIT_FAILURE);
    }
    return res;
}

int main() {
    unsigned char *buffer[NBUF];
    int fd = open("/dev/video4", O_RDWR);
    int size;
    int index;
    int nbufs;

    query_capabilites(fd);
    set_format(fd);
    nbufs = request_buffer(fd, NBUF);
    if ( nbufs > NBUF) {
        fprintf(stderr, "Increase NBUF to at least %i\n", nbufs);
        exit(1);

    }

    for (int i = 0; i < NBUF; i++) {

        /* Assume all sizes is equal.. */
        size = query_buffer(fd, 0, &buffer[0]);

        queue_buffer(fd, i);
    }

    start_streaming(fd);
    fd_set fds;
    FD_ZERO(&fds);
    FD_SET(fd, &fds);
    struct timeval tv = {0};
    tv.tv_sec = 2;
    int r = select(fd+1, &fds, NULL, NULL, &tv);
    if(-1 == r){
        perror("Waiting for Frame");
        exit(1);
    }

    index = dequeue_buffer(fd);
    int file = open("output.raw", O_RDWR | O_CREAT, 0666);
    fprintf(stderr, "file == %i\n", file);
    write(file, buffer[index], size);

    stop_streaming(fd);

    close(file);
    close(fd);

    return 0;
}

Route traffic with NAT

Route traffic with NAT

Long time ago I wrota a blog post [1] about how to use NAT to route traffic to your embedded device via your host computer.

Back then we were using iptables to achieve it, nowadays nftables is the preferred successor, so it is time for an update.

What is NAT anyway?

/media/nat.png

Network Address Translation, or NAT, does map an address space into another by modifying the network address infromation in the IP header for each packet. This is how your router is able to route your local network out to internet.

To share an internet connection this way may sometimes be very practical when working with embedded devices. The network may have restrictions/authentications that stops you from plug in your device directly to a network, your traffic must go via a VPN connection that you host has configured, your device only has an USB interface available... use cases are many.

If your device does not have ethernet nor WiFi but USB with OTG support, you can still share internet by setup a RNDIS gadget device.

Setup

Host setup

  • eth0 has the IP address 192.168.1.50 and is connected to the internet
  • usb0 has IP address 10.2.234.1 and is connected to the target device via RNDIS

The best way to configure nftables is to do it by script. We will setup two rules;

  • A NAT chain for masquerade packages and
  • A forward rule to route packages between usb0 and eth0
#!/usr/sbin/nft -f

table ip imx8_table {
        chain imx8_nat {
                type nat hook postrouting priority 0; policy accept;
                oifname "eth0" masquerade
        }

        chain imx8_forward {
                type filter hook forward priority 0; policy accept;
                iifname "usb0" oifname "eth0" accept
        }
}

We also have to enable IP forwarding. This could be done in several ways:

  • Via sysctl on command line:

    sudo sysctl -w net.ipv4.ip_forward=1
    
  • Via sysctl configuration file:

    echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.conf
    sudo /sbin/sysctl -p
    
  • Via procfs:

    echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
    

Target setup

  • usb0 has ip address 10.2.234.100 and is connected to the host.

You only need to make sure that all traffic is routed via usb0 by setting up a default route:

route add default gw 10.2.234.1 usb0

That is all. You should now be able to route your traffic out to the internet:

ping www.google.se

PING www.google.se (216.58.211.3) 56(84) bytes of data.
64 bytes from muc03s13-in-f3.1e100.net (216.58.211.3): icmp_seq=1 ttl=57 time=12.4 ms
64 bytes from muc03s13-in-f3.1e100.net (216.58.211.3): icmp_seq=2 ttl=57 time=12.4 ms
64 bytes from muc03s13-in-f3.1e100.net (216.58.211.3): icmp_seq=3 ttl=57 time=12.5 ms
64 bytes from muc03s13-in-f3.1e100.net (216.58.211.3): icmp_seq=4 ttl=57 time=12.5 ms

Contiguous Memory Allocator

Contiguous Memory Allocator

Introduction

I do find memory management as one of the most fascinating subsystem in the Linux kernel, and I take every chance I see to talk about it. This post is inspired by a project I'm currently working on; an embedded Linux platform with a camera connected to the CSI-2 bus.

Before we dig into which problems we could trip over, lets talk briefly about how the kernel handles memory.

Memory subsystem

The memory management subsystem handles a wide spectrum of operations which all have impact on the system performance. The subsystem is therefor divided into several parts to sustain operational efficiency and optimized resource handling for different use cases.

Such parts includes:

  • Page allocator
  • Buddy system
  • Kmalloc allocator
  • Slab caches
  • Vmalloc allocator
  • Contiguous memory allocator
  • ...

The smallest allocation unit of memory is a page frame. The Memory Management Unit (MMU) does a terrific job to arrange and map these page frames of the available physical memory into a virtual address space. Most allocations in the kernel are only virtually contiguous which is fine for the most use cases.

Some hardware/IP-blocks requires physically contiguous memory to work though. Direct Memory Access (DMA) transfers are one such case where memory (often) needs to be physically contiguous. Many DMA controllers now supports scatter-gather, which let you hand-pick addresses to make it appear to be contiguous and then let the (IO)MMU do the rest.

To make it works, it requires that the hardware/IP-blocks actually do its memory accesses through the MMU, which is not always the case.

Multimedia devices such as GPU or VPU does often requires huge blocks of physically contiguous memory and do (with exceptions, see Raspberry Pi 4 below) not make use of the (IO)MMU.

Contiguous memory

In order to meet this requirement on big chunks of physically contiguous memory we have to reserve it from the main memory during system boot.

Before CMA, we had to use the mem kernel parameter to limit how much of the system memory that should be available for allocators in the Linux system.

The memory outside this mem-region is not touched by the system and could be remapped into linear address space by the driver.

Here is the documentation for the mem kernel parameter [1]:

mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
                Amount of memory to be used in cases as follows:

                1 for test;
                2 when the kernel is not able to see the whole
                system memory;
                3 memory that lies after 'mem=' boundary is
                excluded from the hypervisor, then
                assigned to KVM guests.
                4 to limit the memory available for kdump kernel.

                [ARC,MICROBLAZE] - the limit applies only to low memory,
                high memory is not affected.

                [ARM64] - only limits memory covered by the linear
                mapping. The NOMAP regions are not affected.

                [X86] Work as limiting max address. Use together
                with memmap= to avoid physical address space collisions.
                Without memmap= PCI devices could be placed at addresses
                belonging to unused RAM.

                Note that this only takes effects during boot time since
                in above case 3, memory may need be hot added after boot
                if system memory of hypervisor is not sufficient.

The mem parameter has a few drawbacks. The driver needs details about where to get the reserved memory and the memory lie momentarily unused when the driver is not initiating any access operations.

Therefor the Contiguous Memory Allocator (CMA) was introduced to manage these reserved memory areas.

The benefits by using CMA is that this area is handled by the allocator algorithms instead of the device driver itself. This let both devices and systems to allocate and use memory from this CMA area through the page allocator for regular needs and through the DMA allocation routines when DMA capabilities is needed.

A few words about Raspberry Pi

Raspberry Pi uses a configuration (config.txt) file that is read by the GPU to initialize the system. The configuration file has many tweakable parameters and one of those are gpu_mem.

This parameter specifies how much memory (in megabytes) to reserve exclusively for the GPU. This works pretty much like the mem kernel commandline parameter described above, with the very same drawbacks. The memory reserved for GPU is not available for the ARM CPU and should be kept as low as possible that your application could work with.

One big difference between the variants of the Raspberry Pi modules is that the Raspberry Pi 4 has a GPU with its own MMU, which allows the GPU to use memory that is dynamically allocated within Linux. The gpu_mem could therfor be kept small on that platform.

The GPU is normally used for displays, 3D calculations, codecs and cameras. One important thing regarding the camera is that the default camera stack (libcamera) does use CMA memory to allocate buffers instead of the reserved GPU memory. In cases that the GPU is only for camera purposes, the gpu_mem could be kept small.

How much CMA is already reserved?

The easiest way to determine how much memory that is reserved for CMA is to consult meminfo:

# grep Cma /proc/meminfo
CmaTotal:         983040 kB
CmaFree:          612068 kB

or look at the boot log:

# dmesg | grep CMA
[    0.000000] Reserved memory: created CMA memory pool at 0x0000000056000000, size 960 MiB

Reserve memory with CMA

/media/reserved.jpg

The CMA area is reserved during boot and there are a few ways to do this.

By device tree

This is the preferred way to define CMA areas.

This example is taken from the device tree bindings documentation [2]:

reserved-memory {
    #address-cells = <1>;
    #size-cells = <1>;
    ranges;

    /* global autoconfigured region for contiguous allocations */
    linux,cma {
        compatible = "shared-dma-pool";
        reusable;
        size = <0x4000000>;
        alignment = <0x2000>;
        linux,cma-default;
    };
};

By kernel command line

The CMA area size could also be specified by the kernel command line. There are tons of references out there that states that the command line parameter is overridden by the device tree, but I thought it sounded weird so I looked it up, and the kernel command line overrides device tree, not the other way around.

At least nowadays:

static int __init rmem_cma_setup(struct reserved_mem *rmem)
{
    ...
    if (size_cmdline != -1 && default_cma) {
        pr_info("Reserved memory: bypass %s node, using cmdline CMA params instead\n",
            rmem->name);
        return -EBUSY;
    }
    ...
}

Here is the documentation for the cma kernel parameter [1]:

cma=nn[MG]@[start[MG][-end[MG]]]
                [KNL,CMA]
                Sets the size of kernel global memory area for
                contiguous memory allocations and optionally the
                placement constraint by the physical address range of
                memory allocations. A value of 0 disables CMA
                altogether. For more information, see
                kernel/dma/contiguous.c

By kernel configuration

The kernel configuration could be used to set min/max and even a percentage of how much of the available memory that should be reserved for the CMA area:

CONFIG_CMA
CONFIG_CMA_AREAS
CONFIG_DMA_CMA
CONFIG_DMA_PERNUMA_CMA
CONFIG_CMA_SIZE_MBYTES
CONFIG_CMA_SIZE_SEL_MBYTES
CONFIG_CMA_SIZE_SEL_PERCENTAGE
CONFIG_CMA_SIZE_SEL_MIN
CONFIG_CMA_SIZE_SEL_MAX
CONFIG_CMA_ALIGNMENT

Conclusion

As soon we are using camera devices with higher resolution and do the image manipulation in the VPU/GPU, we almost always have to increase the CMA area size. Otherwise we will end up with errors like this:

cma_alloc: alloc failed, req-size: 8192 pages, ret: -12

Use custom EDID in Linux

Use custom EDID in Linux

Extended Display Identification Data (EDID) is a metadata format for display devices to describe their capabilities such as resolution, display size, timing, bit depth and update frequency. It is a 128-byte (EDID) or 256-byte (Enhanced-EDID) structure transferred from the display device over the Display Data Channel (DDC) protocol, which is a layer on top of the I2C specification.

The EDID is accessible via the I2C address 0x50 and can usually be read even if the display is turned off, which is quite nice.

Before Video Electronics Standard Association (VESA) came up with this standard, there were multiple non-standard ways out there to provide some kind of basic identification for video device.

Handle all these non-standard ways is of course an unmanageable situation. In that good old days we had to explicitly set all graphics parameters in the xorg.conf file.

Hooray for standards!

Read out the EDID structure

The EDID structure is available for DRM (Direct Rendring Manager) devices via sysfs in raw binary format:

$od  -Anone -t x1 /sys/devices/pci0000:00/0000:00:02.0/drm/card1/card1-DP-4/edid
     00 ff ff ff ff ff ff 00 41 0c c9 c0 ae 00 00 00
     1a 17 01 03 80 3c 22 78 2a 25 95 a9 54 4f a1 26
     0a 50 54 bd 4b 00 d1 00 d1 c0 81 80 95 0f 95 00
     b3 00 81 c0 a9 40 56 5e 00 a0 a0 a0 29 50 30 20
     35 00 55 50 21 00 00 1e 00 00 00 ff 00 41 55 34
     31 33 32 36 30 30 30 31 37 34 00 00 00 fc 00 50
     68 69 6c 69 70 73 20 32 37 32 43 34 00 00 00 fd
     00 32 4c 1e 63 21 00 0a 20 20 20 20 20 20 00 ac

read_edid [1] provide some tools to retrieve and interpret monitor specifications using the VESA DDC protocol. parse-edid is part of this package and we can use it to parse the EDID structure above:

$ parse-edid < /sys/devices/pci0000:00/0000:00:02.0/drm/card1/card1-DP-4/edid
Checksum Correct

Section "Monitor"
    Identifier "Philips 272C4"
    ModelName "Philips 272C4"
    VendorName "PHL"
    # Monitor Manufactured week 26 of 2013
    # EDID version 1.3
    # Digital Display
    DisplaySize 600 340
    Gamma 2.20
    Option "DPMS" "true"
    Horizsync 30-99
    VertRefresh 50-76
    # Maximum pixel clock is 330MHz
    #Not giving standard mode: 1920x1200, 60Hz
    #Not giving standard mode: 1920x1080, 60Hz
    #Not giving standard mode: 1280x1024, 60Hz
    #Not giving standard mode: 1440x900, 75Hz
    #Not giving standard mode: 1440x900, 60Hz
    #Not giving standard mode: 1680x1050, 60Hz
    #Not giving standard mode: 1280x720, 60Hz
    #Not giving standard mode: 1600x1200, 60Hz
    Modeline        "Mode 0" +hsync +vsync
EndSection

This is the EDID for my Philips Monitor.

Provide custom EDID structure to DRM

I'm working with a custom projector (yes, projectors are display devices too) board for an embedded Linux system. Unfortunately, the processor has an errata for the DDC channel which causes the retrieved EDID structure to be corrupt, so I have to manually provide EDID information for the DRM layer.

For such situations, the kernel has introduced the CONFIG_DRM_LOAD_EDID_FIRMWARE configuration item. It let you provide a individually prepared EDID data in the lib/firmware directory to be loaded instead of retrieving it on the DDC channel. The functionality is disabled by default as it is mostly a workaround for broken hardware, and you will, luckily enough, have to search hard to find such a hardware these days.

The sources also contains a few built-in [2] structures for commonly used screen resolutions for us to use:

#define GENERIC_EDIDS 6
static const char * const generic_edid_name[GENERIC_EDIDS] = {
    "edid/800x600.bin",
    "edid/1024x768.bin",
    "edid/1280x1024.bin",
    "edid/1600x1200.bin",
    "edid/1680x1050.bin",
    "edid/1920x1080.bin",
};

See the kernel documentation [3] for more details.

Use the custom EDID structure

We could either place our custom EDID data in /lib/firmware/edid/ or use one of those build-in structures. Either way, pass drm_kms_helper.edid_firmware pointing to the right structure as argument to the kernel.

Example on bootargs that use the built-in 800x600 EDID structure:

drm_kms_helper.edid_firmware=edid/800x600.bin

Here is my projector in action showing the Qt Analog Clock [4] example.

/media/edid-clock.jpg

(Yes, crappy image, it looks much better IRL)