Mutex guards in the Linux kernel

Mutex guards in the Linux kernel

I found an interresting thread [1] while searching my inbox for something completely unrelated.

Peter Zijistra has written a few cleanup functions that where introduced in v6.4 with this commit:

commit 54da6a0924311c7cf5015533991e44fb8eb12773
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri May 26 12:23:48 2023 +0200

    locking: Introduce __cleanup() based infrastructure

    Use __attribute__((__cleanup__(func))) to build:

     - simple auto-release pointers using __free()

     - 'classes' with constructor and destructor semantics for
       scope-based resource management.

     - lock guards based on the above classes.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

It adds functionality to "guard" locks. The guard wraps the lock, takes ownership of the given mutex and release it as soon the guard leaves the scope. In other words - no more forgotten locks due to early exits.

Compare this to the std::lock_guard class we have in C++.

Although this adds valuable functionality to the core, it is currently not widely used. In fact, it only has two users in the latest (v.6.6) kernel:

	$ git grep -l "guard(mutex)" 
	drivers/gpio/gpio-sim.c
	kernel/sched/core.c

Hands on

I have adapted ( [2], [3]) two of my drivers to make use of the guard locks. The adaptation is quickly made.

The features is located in linux/cleanup.h:

+#include <linux/cleanup.h>

Then we can start to make use of the guards. What I like is that the code will be simplier in two ways:

  • All the mutex_lock-pairs in the same scope could be replaced with guard(mutex)(&your->mutex).
  • The code can now return without taking any taken locks into account.

Together with device managed ( devm ) resources, you will end up with a code that clean up itself pretty good.

A typical adaption to guarded mutexes could look likt this:

	@@ -83,31 +85,26 @@ static int pxrc_open(struct input_dev *input)
		struct pxrc *pxrc = input_get_drvdata(input);
		int retval;

	-       mutex_lock(&pxrc->pm_mutex);
	+       guard(mutex)(&pxrc->pm_mutex);
		retval = usb_submit_urb(pxrc->urb, GFP_KERNEL);
		if (retval) {
			dev_err(&pxrc->intf->dev,
				"%s - usb_submit_urb failed, error: %d\n",
				__func__, retval);
	-               retval = -EIO;
	-               goto out;
	+               return -EIO;
		}

		pxrc->is_open = true;
	-
	-out:
	-       mutex_unlock(&pxrc->pm_mutex);
	-       return retval;
	+       return 0;
	 }

What it does is:

  • Removes the mutex_lock/mutex_unlock pair
  • Simplifies the error handling to just return in case of error
  • No need for the out: label anymore so remove it

Under the hood

The implementation makes use of the __attribute__((cleanup())) attribute that is available for both LLVM [4] and GCC [5].

Here is what the GCC documentation [5] says about the cleanup_function:

cleanup (cleanup_function)
The cleanup attribute runs a function when the variable goes out of scope. This attribute can only be applied to auto function scope variables; it may not be applied to parameters or variables with static storage duration.
The function must take one parameter, a pointer to a type compatible with the variable. The return value of the function (if any) is ignored.

If -fexceptions is enabled, then cleanup_function is run during the stack unwinding that happens during the processing of the exception.
Note that the cleanup attribute does not allow the exception to be caught, only to perform an action. It is undefined what happens if cleanup_function does not return normally.

To illustrate this, consider the following example:

#include <stdio.h>

void cleanup_func (int *x)
{
	printf("Tidy up for x as it is leaving its scope\n");
}

int main(int argc, char **argv)
{
	printf("Start\n");
	{
		int x __attribute__((cleanup(cleanup_func)));
		/* Do stuff */
	}
	printf("Exit\n");
}

We create a variable, x, declared with the cleanup attribute inside of its own scope. This makes that the cleanup_func() will be called as soon x goes out of scope.

Here is the output of the example above:

$ gcc main.c -o main && ./main
Start
Tidy up for x as it is leaving its scope
Exit

As you can see, the cleanup_func() is called in between of Start and Exit - as expected.

Test packages in Buildroot

Test packages in Buildroot

When writing packages for Buildroot there are several conditions that you have to test your package against.

This includes different toolchains, architectures, C-libraries, thread-implementations and more. To help you with that, Buildroot provides the utils/test-pkg script.

Nothing describes the script better than its own help text [1]:

test-pkg: test-build a package against various toolchains and architectures

The supplied config snippet is appended to each toolchain config, the
resulting configuration is checked to ensure it still contains all options
specified in the snippet; if any is missing, the build is skipped, on the
assumption that the package under test requires a toolchain or architecture
feature that is missing.

In case failures are noticed, you can fix the package and just re-run the
same command again; it will re-run the test where it failed. If you did
specify a package (with -p), the package build dir will be removed first.

The list of toolchains is retrieved from support/config-fragments/autobuild/toolchain-configs.csv.
Only the external toolchains are tried, because building a Buildroot toolchain
would take too long. An alternative toolchains CSV file can be specified with
the -t option. This file should have lines consisting of the path to the
toolchain config fragment and the required host architecture, separated by a
comma. The config fragments should contain only the toolchain and architecture
settings.

By default, a useful subset of toolchains is tested. If needed, all
toolchains can be tested (-a), an arbitrary number of toolchains (-n
in order, -r for random).

Hands on

In these examples I'm going to build the CRIU package that i recently worked on.

First I will create a config snippet that contains the necessary options to enable my packet. In my case it is CRIU and HOST_PYTHON3:

     cat > criu.config <<EOF
     BR2_PACKAGE_HOST_PYTHON3=y
     BR2_PACKAGE_CRIU=y
     EOF

I can now start test-pkg and provide the config snippet:

     utils/test-pkg -c criu.config -p criu
                     bootlin-armv5-uclibc [1/6]: FAILED
                      bootlin-armv7-glibc [2/6]: FAILED
                    bootlin-armv7m-uclibc [3/6]: SKIPPED
                      bootlin-x86-64-musl [4/6]: OK
                       br-arm-full-static [5/6]: SKIPPED
                             sourcery-arm [6/6]: FAILED
     6 builds, 2 skipped, 3 build failed, 0 legal-info failed, 0
show-info failed

Some of them fails, the end of the build log is available at ~/br-test-pkg/*/logfile.

Now start read the logfiles and fix the errors for each failed test.

Once the test-pkg (without -a) will have no failures, it will be a good test to retry it with the -a (run all tests) option:

     utils/test-pkg -a -c criu.config -p criu

As soon as more and more tests passes, it takes unnecessary amount of time to run the same (successful) test again, so it is better to hand-pick those tests that actually fails.

To do this you may provide a specific list of toolchain configurations:

     cp support/config-fragments/autobuild/toolchain-configs.csv
criu-toolchains.csv
     # edit criu-toolchains.csv to keep your toolchains of interest.
     utils/test-pkg -a -t criu-toolchains.csv -c criu.config -p criu

This will retest only the toolchains kept in the csv.

Git version in cmake

Git version in CMake

All applications have versions. The version should somehow be exposed in the application to make it possible to determine which application we are actually running.

I've seen a plenty of variants on how this is achieved, some are good and some are really bad. Since it's such a common thing, I thought I'd show how I usually do it.

I use to let CMake determine the version based on git describe and tags, the benefit is that it is part of the build process (i.e no extra step on the build server - I've seen it all...), and you will get right version information also for local builds.

git describe also gives you useful information which allows you to derive the build to the actual commit. Incredible useful.

The version information will be stored as a define in a header file which should be included into the project.

Hands on

The version file we intend to include in the project is generated from Version.h.in:

#pragma once
#define SOFTWARE_VERSION "@FOO_VERSION@"

@FOO_VERSION@ is a placeholder that will be replaced by configure_file() later on.

The "core" functionality is in GenerateVersion.cmake:

find_package(Git)

if(GIT_EXECUTABLE)
  get_filename_component(WORKING_DIR ${SRC} DIRECTORY)
  execute_process(
    COMMAND ${GIT_EXECUTABLE} describe --tags --dirty
    WORKING_DIRECTORY ${WORKING_DIR}
    OUTPUT_VARIABLE FOO_VERSION
    RESULT_VARIABLE ERROR_CODE
    OUTPUT_STRIP_TRAILING_WHITESPACE
    )
endif()

if(FOO_VERSION STREQUAL "")
  set(FOO_VERSION 0.0.0-unknown)
  message(WARNING "Failed to determine version from Git tags. Using default version \"${FOO_VERSION}\".")
endif()

configure_file(${SRC} ${DST} @ONLY)

It will look for the git package, recieve the tag information with git describe and store the version in the FOO_VERSION variable.

configure_file() is then used to create a version.h out of version.h.in where the FOO_VERSION placeholder is replaced by the actual version.

To use GenerateVersion.cmake, we have to create a custom target in the CMakeLists.txt file and provide the source and destination path for the version file:

add_custom_target(version
  ${CMAKE_COMMAND} -D SRC=${CMAKE_SOURCE_DIR}/Version.h.in
                   -D DST=${CMAKE_SOURCE_DIR}/Version.h
                   -P ${CMAKE_SOURCE_DIR}/GenerateVersion.cmake
  )

Finally, make sure that the target (${PROJECT_NAME} in this case) depends on the version target:

add_dependencies(${PROJECT_NAME} version)

That's basically it.

Burn eFuses for MAC address on iMX8MP

Burn eFuses for MAC address on iMX8MP

The iMX (iMX6, iMX7, iMX8) has a similiar OCOTP (On-Chip One Time Programmable) module that store e.g. the MAC addresses for the internal ethernet controllers.

The reference manual is not clear either on the byte order or which bytes belong to which MAC address when there are several. In fact, I had to look at the U-boot implementation [1] to know for sure how these fuses is used:

void imx_get_mac_from_fuse(int dev_id, unsigned char *mac)
{
	struct imx_mac_fuse *fuse;
	u32 offset;
	bool has_second_mac;

	offset = is_mx6() ? MAC_FUSE_MX6_OFFSET : MAC_FUSE_MX7_OFFSET;
	fuse = (struct imx_mac_fuse *)(ulong)(OCOTP_BASE_ADDR + offset);
	has_second_mac = is_mx7() || is_mx6sx() || is_mx6ul() || is_mx6ull() || is_imx8mp();

	if (has_second_mac && dev_id == 1) {
		u32 value = readl(&fuse->mac_addr2);

		mac[0] = value >> 24;
		mac[1] = value >> 16;
		mac[2] = value >> 8;
		mac[3] = value;

		value = readl(&fuse->mac_addr1);
		mac[4] = value >> 24;
		mac[5] = value >> 16;

	} else {
		u32 value = readl(&fuse->mac_addr1);

		mac[0] = value >> 8;
		mac[1] = value;

		value = readl(&fuse->mac_addr0);
		mac[2] = value >> 24;
		mac[3] = value >> 16;
		mac[4] = value >> 8;
		mac[5] = value;
	}
}

OCOTP Layout

The fuses related to MAC addresses starts at offset 0x640 for iMX7 and iMX8MP, and at offset 0x620 for all iMX6 processors.

The MAC fuses belongs to fuse bank 9 as seen in the table below:

/media/imx8-mac-fuse-layout.png

Burn fuses

There are several ways to burn the fuses nowadays. A few years ago, the only way (that I'm aware of) was by the non-mainlined fsl_otp-driver provided in the Freescale kernel tree. I'm not going to describe how to use since it should not be used anyway.

The fuses are mapped to the MAC address as described in this picture:

/media/imx8-mac-fuse-example.png

The iMX8MP has two MACs and we will assign the MAC address 00:bb:cc:dd:ee:ff for MAC0 and 00:22:33:44:55:66 for MAC1.

Via U-boot

With the CONFIG_CMD_FUSE config set, U-boot are able to burn and sense eFuses via the fuse command:

u-boot=> fuse
fuse - Fuse sub-system

Usage:
fuse read <bank> <word> [<cnt>] - read 1 or 'cnt' fuse words,
    starting at 'word'
fuse sense <bank> <word> [<cnt>] - sense 1 or 'cnt' fuse words,
    starting at 'word'
fuse prog [-y] <bank> <word> <hexval> [<hexval>...] - program 1 or
    several fuse words, starting at 'word' (PERMANENT)
fuse override <bank> <word> <hexval> [<hexval>...] - override 1 or
    several fuse words, starting at 'word'

Burn the fuses with fuse prog:

fuse prog -y 9 0 0xccddeeff
fuse prog -y 9 1 0x556600bb
fuse prog -y 9 2 0x00223344

And read it back with fuse sense:

u-boot=> fuse sense 9 0
Sensing bank 9:

Word 0x00000000: ccddeeff
u-boot=> fuse sense 9 1
Sensing bank 9:

Word 0x00000001: 556600bb
u-boot=> fuse sense 9 2
Sensing bank 9:

Word 0x00000002: 00223344

As it is a U-boot command, it is also possible to burn the fuses with UUU [2] (Universal Update Utility) via the SDP protocol. It could be handy e.g. in production.

Example on a uuu-script:

$ cat imx8mplus-emmc-all.uuu 
uuu_version 1.2.39

# This script will flash u-boot to mmc on bus 1
# Usage: uuu <script>

SDPS: boot -f ../imx-boot

#Burn fuses
FB: ucmd fuse prog -y 9 0 0xccddeeff
FB: ucmd fuse prog -y 9 1 0x556600bb
FB: ucmd fuse prog -y 9 2 0x00223344

#Burn image
FB: ucmd setenv emmc_dev 2
FB: ucmd setenv emmc_ack 1
FB: ucmd setenv fastboot_dev mmc
FB: ucmd setenv mmcdev ${emmc_dev}
FB: ucmd mmc dev ${emmc_dev}
FB: flash -raw2sparse all ../distro-image-dev-imx8mp.wic
FB: ucmd mmc partconf ${emmc_dev} ${emmc_ack} 1 0
FB: done

Via nvmem-imx-ocotp

The OCOTP-module is exposed by the nvmem-imx-ocotp (CONFIG_NVMEM_IMX_OCOTP) driver and the fuses could be read and written to via the sysfs entry /sys/devices/platform/soc@0/30000000.bus/30350000.efuse/imx-ocotp0/nvmem.

Note that it is not the full OCOTP module but only the eFuses that are exposed this way, so MAC_ADDR0 is placed at offset 0x90, not 0x640!

We could read out our MAC addresses at offset 0x90:

root@imx8mp:~# hexdump /sys/devices/platform/soc@0/30000000.bus/30350000.efuse/imx-ocotp0/nvmem 
0000000 a9eb ffaf aaff 0002 52bb ea35 6000 1119
0000010 4591 2002 0000 0100 007f 0000 2000 9800
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0000040 bada bada bada bada bada bada bada bada
*
0000060 0000 0000 0000 0000 0000 0000 0000 0000
*
0000080 0000 0000 0000 0000 0000 0000 0004 0000
0000090 eeff ccdd 00bb 5566 3344 0022 0000 0000

We can also see we have the expected MAC addresses set for our interfaces:

root@imx8mp:~# ip a
4: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 00:bb:cc:dd:ee:ff brd ff:ff:ff:ff:ff:ff
5: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 00:22:33:44:55:66 brd ff:ff:ff:ff:ff:ff

Loopback with two (physical) ethernet interfaces

Loopback with two (physical) ethernet interfaces

Imagine that you have an embedded device with two physical ethernet ports. You want to verify the functionality of both these ports in the manufacturing process, so you connect an ethernet cable between the ports, setup IP addresses and now what?

As Linux (actually the default network namespace) is aware of the both adapters and their IP/MAC-addresses, the system see no reason to send any traffic out. Instead, Linux will loop all traffic between the interfaces internally.

To avoid that and actually force traffic out on the cable, we have to make the adapters unaware of eachother. This is done by putting them into different network namespaces!

/media/loopback.png

Hands on

To do this, all you need is to have support for network namespaces in the kernel (CONFIG_NET_NS=y) and the iproute2 [1] package, which both probably is included in every standard Linux distribution nowadays.

We will create two network namespaces, lets call them netns_eth0 and netns_eth1:

ip netns add netns_eth0
ip netns add netns_eth1

Move each adapter to their new home:

ip link set eth0 netns netns_eth0
ip link set eth1 netns netns_eth1

Assign ip addresses:

ip netns exec netns_eth0 ip addr add dev eth0 192.168.0.1/24
ip netns exec netns_eth1 ip addr add dev eth1 192.168.0.2/24

Bring up the interfaces:

ip netns exec netns_eth0 ip link set eth0 up
ip netns exec netns_eth1 ip link set eth1 up

Now we can ping each interface and know for sure that the traffic is actually on the cable:

ip netns exec netns_eth0 ping 192.168.0.2
ip netns exec netns_eth1 ping 192.168.0.1

kas-container and QEMU

kas-container and QEMU

KAS

KAS [1] is a setup tool for bitbake based projects such as Yocto. There are many similiar alternatives out there and I've tried most of them, but my absolute favorite is KAS.

In order to use KAS, you have to setup a YAML file to contain information about your machine, distribution, meta layers and local configuration. Here is a small example configuration copied from the KAS documentation:

# Every file needs to contain a header, that provides kas with information
# about the context of this file.
header:
  # The `version` entry in the header describes for which configuration
  # format version this file was created for. It is used by kas to figure
  # out if it is compatible with this file. The version is an integer that
  # is increased on every format change.
  version: x
# The machine as it is written into the `local.conf` of bitbake.
machine: qemux86-64
# The distro name as it is written into the `local.conf` of bitbake.
distro: poky
repos:
  # This entry includes the repository where the config file is located
  # to the bblayers.conf:
  meta-custom:
  # Here we include a list of layers from the poky repository to the
  # bblayers.conf:
  poky:
    url: "https://git.yoctoproject.org/git/poky"
    commit: 89e6c98d92887913cadf06b2adb97f26cde4849b
    layers:
      meta:
      meta-poky:
      meta-yocto-bsp:

bblayers_conf_header:
  meta-custom: |
    POKY_BBLAYERS_CONF_VERSION = "2"
    BBPATH = "${TOPDIR}"
    BBFILES ?= ""    
local_conf_header:
  meta-custom: |
    PATCHRESOLVE = "noop"
    CONF_VERSION = "1"
    IMAGE_FSTYPES = "tar"    

That is all you need to start to build your distribution:

kas build kas-project.yml

kas-container

KAS also comes with kas-container. It does the same thng and takes the same arguments as the kas command, but it executes in a container (either docker or podman) instead.

For people like me who use ArchLinux (or other rolling distributions), building in a container is preferred as you otherwise will end up with wierd incompatibility problems pretty soon.

It is also useful on e.g. build servers as those tend to have an unpredictable environment as well.

Yocto and QEMU

Yocto [3] let you emulate and virtualize your images you have built using the Yocto Project. It makes use of the runqemu help script to find the build artifacts and setup everything to start the emulator.

To add support for qemu you have to include the qemuboot image class into your local.conf:

IMAGE_CLASSES += "qemuboot"

When you include the image class to your project it will generate an *.qemuboot.conf file among your artifacts and contains the configuration for runqemu.

The configuration file [4] has many variables that can be overridden in your local.conf. Here are some of them:

  • QB_SYSTEM_NAME - qemu name, e.g., "qemu-system-i386"
  • QB_OPT_APPEND - options to append to qemu, e.g., "-device usb-mouse"
  • QB_DEFAULT_KERNEL - default kernel to boot
  • QB_DEFAULT_FSTYPE - default FSTYPE to boot
  • QB_MEM - memory
  • QB_MACHINE - qemu machine
  • QB_CPU - qemu cpu
  • QB_SMP - amount of CPU cores inside qemu guest, each mapped to a thread on the host
  • QB_KERNEL_CMDLINE_APPEND - options to append to kernel's -append option
  • QB_DTB - qemu dtb name
  • QB_AUDIO_DRV - qemu audio driver
  • QB_AUDIO_OPT - qemu audio option
  • QB_RNG - Pass-through for host random number generator
  • QB_KERNEL_ROOT - kernel's root gets passed to the kernel.
  • QB_NETWORK_DEVICE - network device
  • QB_TAP_OPT - network option for 'tap' mode
  • QB_SLIRP_OPT - network option for SLIRP mode
  • QB_CMDLINE_IP_SLIRP - If QB_NETWORK_DEVICE adds more than one network interface to qemu
  • QB_ROOTFS_OPT - used as rootfs
  • QB_SERIAL_OPT - serial port
  • QB_TCPSERIAL_OPT - tcp serial port option
  • QB_ROOTFS_EXTRA_OPT - extra options to be appended to the rootfs device in case there is none specified by QB_ROOTFS_OPT.
  • QB_GRAPHICS - QEMU video card type
  • QB_NFSROOTFS_EXTRA_OPT - extra options to be appended to the nfs rootfs options in kernel boot arg

QB_MEM is set to -m 256 as default. I had to increase it a lot as I'm running Azure in my setup.

kas-container and QEMU

Running qemu in kas-container is pretty straight forward, there is just a few things to keep in mind if you need a network connection.

By default, qemu is mapping a TAP [5] interface to the emulated environment to be able to route traffic to your network. First, it requires you to have the tap module loaded on the host:

sudo modprobe tap

runqemu then uses iptables to setup NAT routing for the TAP interface. Unfortunately, iptables is not included in the docker image so we have to add it. You find the Dockerfile in the KAS repository [6].

diff --git a/Dockerfile b/Dockerfile
index 0e79cb5..c331e45 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -33,7 +33,7 @@ RUN apt-get update && \
         python3-pip python3-setuptools python3-wheel python3-yaml python3-distro python3-jsonschema \
         python3-newt python3-colorlog python3-kconfiglib \
         gosu lsb-release file vim less procps tree tar bzip2 zstd pigz lz4 unzip tmux libncurses-dev \
-        git-lfs mercurial iproute2 ssh-client telnet curl rsync gnupg awscli sudo \
+        git-lfs mercurial iproute2 ssh-client telnet curl rsync gnupg awscli sudo iptables \
         socat bash-completion && \
     apt-get clean && \
     rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
diff --git a/kas-container b/kas-container
index 8fa2d16..2cd88e1 100755
--- a/kas-container
+++ b/kas-container
@@ -135,7 +135,7 @@ run_clean() {
        fi
 }

Now we have to expose the tap device into the container and give it permission to create network rules. This is done by send --device /dev/net/tun:/dev/net/tun and --cap-add=NET_ADMIN as arguments to docker.

BE AWARE OF THAT YOU NOW ARE GIVING THE CONTAINER NET_ADMIN CAPABILITIES!

Now we are ready to start a shell:

kas-container --docker-args "--device /dev/net/tun:/dev/net/tun --cap-add=NET_ADMIN"  shell ./kas-project.yml
/media/kas-container-qemu.png

As we do not export a framebuffer device to the container, we start runqemu with the nographic parameter.

runqemu nographic

[  OK  ] Finished IPv6 Packet Filtering Framework.
[  OK  ] Finished IPv4 Packet Filtering Framework.
[  OK  ] Reached target Preparation for Network.
         Starting Network Configuration...
[  OK  ] Finished OpenSSH Key Generation.
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started User Login Management.
[  OK  ] Started Network Configuration.
         Starting Wait for Network to be Configured...
         Starting Network Name Resolution...
[  OK  ] Started Network Name Resolution.
[  OK  ] Reached target Network.
[  OK  ] Reached target Host and Network Name Lookups.
         Starting containerd container runtime...
         Starting DNS forwarder and DHCP server...
         Starting Hostapd IEEE 802.…A2/EAP/RADIUS Authenticator...
[  OK  ] Started DNS forwarder and DHCP server.
[  OK  ] Started containerd container runtime.

Test Distro v1 qemuarm64 ttyAMA0

qemuarm64 login:

Here we go.

Support for CRIU in Buildroot

Support for CRIU in Buildroot

A couple of months ago I started to evaluate [1] CRIU [2] for a project I'm working on. The project itself is using Buildroot to build and generate the root filesystem. Unfortunately, Buildroot lacks support for CRIU so there were some work to do.

/media/buildroot-plus-criu.png

To write the package was not straight forward. The package is only supported on certain architectures and the utils/test-pkg script failed for a few toolchains. Julien Olivain was really helpful to sort it out and he even wrote runtime scripts for it. Thanks for that.

I do not understand why projects still use custom Makefiles instead of CMake or Autotools though. Is is something essential that I've completely missed?

Kernel configuration

CRIU makes use of a lot of features that has to be enabled in the Linux kernel for full usage.

CONFIG_CHECKPOINT_RESTORE will be set by the package itself, but there are more configuration options that could be useful depending on how you intend to use the tool.

Relevant configuration options are:

General setup options
  • CONFIG_CHECKPOINT_RESTORE=y (Checkpoint/restore support)
  • CONFIG_NAMESPACES=y (Namespaces support)
  • CONFIG_UTS_NS=y (Namespaces support -> UTS namespace)
  • CONFIG_IPC_NS=y (Namespaces support -> IPC namespace)
  • CONFIG_SYSVIPC_SYSCTL=y
  • CONFIG_PID_NS=y (Namespaces support -> PID namespaces)
  • CONFIG_NET_NS=y (Namespaces support -> Network namespace)
  • CONFIG_FHANDLE=y (Open by fhandle syscalls)
  • CONFIG_EVENTFD=y (Enable eventfd() system call)
  • CONFIG_EPOLL=y (Enable eventpoll support)
  • CONFIG_RSEQ=y (Enable eventpoll support)
Networking support -> Networking options options for sock-diag subsystem
  • CONFIG_UNIX_DIAG=y (Unix domain sockets -> UNIX: socket monitoring interface)
  • CONFIG_INET_DIAG=y (TCP/IP networking -> INET: socket monitoring interface)
  • CONFIG_INET_UDP_DIAG=y (TCP/IP networking -> INET: socket monitoring interface -> UDP: socket monitoring interface)
  • CONFIG_PACKET_DIAG=y (Packet socket -> Packet: sockets monitoring interface)
  • CONFIG_NETLINK_DIAG=y (Netlink socket -> Netlink: sockets monitoring interface)
  • CONFIG_NETFILTER_XT_MARK=y (Networking support -> Networking options -> Network packet filtering framework (Netfilter) -> Core Netfilter Configuration -> Netfilter Xtables support (required for ip_tables) -> nfmark target and match support)
  • CONFIG_TUN=y (Networking support -> Universal TUN/TAP device driver support)

In the beginning of the project, CRIU had their own custom kernel which contained some experimental CRIU related patches. Nowadays many of those patches has been mainlined.

One such patch [3] that I missed in my current kernel verson (v5.10) was introduced in v5.12. It was related to how CRIU gets the process state and is essential to create a checkpoint of a running process:

commit 90f093fa8ea48e5d991332cee160b761423d55c1
Author: Piotr Figiel <figiel@google.com>
Date:   Fri Feb 26 14:51:56 2021 +0100

    rseq, ptrace: Add PTRACE_GET_RSEQ_CONFIGURATION request

    For userspace checkpoint and restore (C/R) a way of getting process state
    containing RSEQ configuration is needed.

    There are two ways this information is going to be used:
     - to re-enable RSEQ for threads which had it enabled before C/R
     - to detect if a thread was in a critical section during C/R

    Since C/R preserves TLS memory and addresses RSEQ ABI will be restored
    using the address registered before C/R.

    Detection whether the thread is in a critical section during C/R is needed
    to enforce behavior of RSEQ abort during C/R. Attaching with ptrace()
    before registers are dumped itself doesn't cause RSEQ abort.
    Restoring the instruction pointer within the critical section is
    problematic because rseq_cs may get cleared before the control is passed
    to the migrated application code leading to RSEQ invariants not being
    preserved. C/R code will use RSEQ ABI address to find the abort handler
    to which the instruction pointer needs to be set.

    To achieve above goals expose the RSEQ ABI address and the signature value
    with the new ptrace request PTRACE_GET_RSEQ_CONFIGURATION.

    This new ptrace request can also be used by debuggers so they are aware
    of stops within restartable sequences in progress.

    Signed-off-by: Piotr Figiel <figiel@google.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Michal Miroslaw <emmir@google.com>
    Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Acked-by: Oleg Nesterov <oleg@redhat.com>
    Link: https://lkml.kernel.org/r/20210226135156.1081606-1-figiel@google.com

With that said, to make use of CRIU with the latest features it is highly recommended to use a recent kernel.

And soon it will be available as a package in Buildroot.

Use b4 for kernel contributions

Use b4 for kernel contributions

There is a little tool called b4 [1] that has been part of my workflow with the Linux kernel for a while. It is developed to be a tool used to simplify the work of the maintainers, but my main use of the tool has been to fetch patch series from the mailing list and apply them to my local git repository during reviews. I recently noticed that it got a lot of handy features (experimental though) for the contributors as well, which I now want to test!

The reason I started digging into my toolbox started with a discussion I had with a friend about how I could possibly prefer an email-based workflow when working with FOSS versus new and fresh web-based tools like Github and Gitlab. In web-based tools, all you have to do is push the right button, and email is so old-fashioned, isn't it?

First of all, I never hit the right button. Never. Secondly, it disturbs my workflow.

That is also my biggest reason; these "old school" tools fits my workflow perfectly:

  • I use mutt [4] to read and send my emails.
  • I use vim [5] for all text manipulation.
  • I use git [6] to manage my code.
  • I use b4 [1] in the (kernel) review process (and in contributions from now on)

These are the tools that I use for almost every email based FOSS project that I'm involved with. The first three is rather common, but what about b4? It is not too much information out there so I think a little introduction might be good.

/media/b4.png

So, what is b4?

The project started as a tool named get-lore-mbox [2] , which later became b4. Fun fact is that the name, b4, was chosen for ease of typing and because B-4 was the precursor to Lore and Data in the Star Trek universe :-)

B4 is a tool to simplify the development workflow with distributed patches, especially mailing lists. It works with public inbox archives and aims to be a useful tool for both maintainers and developers.

Example on what b4 will do for you:

  • Retrieve patch series from a public mailing list (e.g. lore.kernel.org)
  • Compare patch series
  • Apply patch series to your git repository
  • Prepare and send in your work
  • Retrieve code-review trailers

It is a pretty competent tool.

Install b4

First we have to install b4 on our system.

B4 is propbably already available in your distribution:

Archlinux:

$ pacman -Sy b4

Ubuntu:

$ apt-get install b4

Fedora:

$ dnf install b4

Or whatever package manager you use.

It is also possible to install it with pip:

$ python3 -m pip install --user b4

And of course, run it directly from the git repository [3] :

$ git clone https://git.kernel.org/pub/scm/utils/b4/b4.git
$ cd b4
$ git submodule update --init
$ pip install --user -r requirements.txt

Review patches workflow

I use b4 shazam to fetch the latest version of a patch series and apply them to my tree. All you need to provide is the Message-ID of the thread that you find in the email header of the patch. For instance:

$ b4 shazam 20230820102610.755188-6-marcus.folkesson@gmail.com
Grabbing thread from lore.kernel.org/all/20230820102610.755188-6-marcus.folkesson@gmail.com/t.mbox.gz
Checking for newer revisions
Grabbing search results from lore.kernel.org
  Added from v8: 7 patches
Analyzing 17 messages in the thread
Will use the latest revision: v8
You can pick other revisions using the -vN flag
Checking attestation on all messages, may take a moment...
---
  [PATCH v8 1/6] dt-bindings: iio: adc: mcp3911: add support for the whole MCP39xx family
  [PATCH v8 2/6] iio: adc: mcp3911: make use of dev_err_probe()
  [PATCH v8 3/6] iio: adc: mcp3911: simplify usage of spi->dev
  [PATCH v8 4/6] iio: adc: mcp3911: fix indentation
  [PATCH v8 5/6] iio: adc: mcp3911: avoid ambiguity parameters in macros
  [PATCH v8 6/6] iio: adc: mcp3911: add support for the whole MCP39xx family
  ---
  NOTE: install dkimpy for DKIM signature verification
---
Total patches: 6
---
 Base: using specified base-commit b320441c04c9bea76cbee1196ae55c20288fd7a6
Applying: dt-bindings: iio: adc: mcp3911: add support for the whole MCP39xx family
Applying: iio: adc: mcp3911: make use of dev_err_probe()
Applying: iio: adc: mcp3911: simplify usage of spi->dev
Applying: iio: adc: mcp3911: fix indentation
Applying: iio: adc: mcp3911: avoid ambiguity parameters in macros
Applying: iio: adc: mcp3911: add support for the whole MCP39xx family

Or even b4 prep to create a branch for it. This will not fetch the latest version though.

$ b4 prep -n review -F  20230820102610.755188-6-marcus.folkesson@gmail.com
Checking attestation on all messages, may take a moment...
---
  [PATCH v7 1/6] dt-bindings: iio: adc: mcp3911: add support for the whole MCP39xx family
  [PATCH v7 2/6] iio: adc: mcp3911: make use of dev_err_probe()
  [PATCH v7 3/6] iio: adc: mcp3911: simplify usage of spi->dev
  [PATCH v7 4/6] iio: adc: mcp3911: fix indentation
  [PATCH v7 5/6] iio: adc: mcp3911: avoid ambiguity parameters in macros
  [PATCH v7 6/6] iio: adc: mcp3911: add support for the whole MCP39xx family
  ---
  NOTE: install dkimpy for DKIM signature verification
---
Created new branch b4/review
Applying 6 patches
---
Applying: dt-bindings: iio: adc: mcp3911: add support for the whole MCP39xx family
Applying: iio: adc: mcp3911: make use of dev_err_probe()
Applying: iio: adc: mcp3911: simplify usage of spi->dev
Applying: iio: adc: mcp3911: fix indentation
Applying: iio: adc: mcp3911: avoid ambiguity parameters in macros
Applying: iio: adc: mcp3911: add support for the whole MCP39xx family

Once you have the patches applied to your local repository it is easier to perform a review as it gives a better context when you can jump around in the codebase. Also, it allows you to run any scripts for sanity checks and such.

That is pretty much how I use b4 in the review process. b4 has a lot of more neat features such as fetching pull-requests or generate thank-emails when something gets merged/applied, but that is nothing I use for the moment.

Contributor's workflow

As I said, I've been unaware of that b4 can assist you even in the workflow as a contributor. I'm so excited!

The workflow

These steps is more or less copied directly from the documentation [1].

  1. Prepare your patch series by using b4 prep and queueing your commits. Use git rebase -i to arrange the commits in the right order and to write good commit messages.
  2. Prepare your cover letter using b4 prep --edit-cover. You should provide a good overview of what your series does and why you think it will improve the current code.
  3. When you are almost ready to send, use b4 prep --auto-to-cc to collect the relevant addresses from your commits. If your project uses a MAINTAINERS file, this will also perform the required query to figure out who should be included on your patch series submission.
  4. Review the list of addresses that were added to the cover letter and, if you know what you're doing, remove any that you think are unnecessary.
  5. Send your series using b4 send. This will automatically reroll your series to the next version and add changelog entries to the cover letter.
  6. Await code review and feedback from maintainers.
  7. Apply any received code-review trailers using b4 trailers -u.
  8. Use git rebase -i to make any changes to the code based on the feedback you receive. Remember to record these changes in the cover letter's changelog.
  9. Unless series is accepted upstream, GOTO 3.
  10. Clean up obsolete prep-managed branches using b4 prep --cleanup

Example of usage

A Lego loving friend of mine pointed out that a reference in kernel documentation [6] I wrote no longer points to the site it was supposed to. That is something we are going to fix!

Just follow the steps listed above.

Start by prepare the tree with b4 prep -n pxrc -f v6.3:

$ b4 prep -n pxrc -f v6.3
Created new branch b4/pxrc
Created the default cover letter, you can edit with --edit-cover.

Now we have a branch for our work. We base our work on the v6.3 tag.

Next step is to edit the coverletter by b4 prep --edit-cover.

Here is what the first patch (coverletter) looks like after editing:

$ git show HEAD
commit b97650b087d88d113b11cd1bc367c67c314d77a1 (HEAD -> b4/pxrc)
Author: Marcus Folkesson <marcus.folkesson@gmail.com>
Date:   Fri Aug 25 11:43:52 2023 +0200

    Remove reference to site that is no longer related
    
    Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
    
    --- b4-submit-tracking ---
    # This section is used internally by b4 prep for tracking purposes.
    {
      "series": {
        "revision": 1,
        "change-id": "20230825-pxrc-8518b297cd21",
        "prefixes": []
      }
    }

Notice the meta information about this series (revision, change-id). The change-id will follow us through all the versions of the patch.

Next is to commit the changes as usual with git add and git commit:

$ git add Documentation/input/devices/pxrc.rst
$ git commit --signoff
$ git show
commit 00cf9f943529c06b36e89407aecc18d46f1b028e (HEAD -> b4/pxrc)
Author: Marcus Folkesson <marcus.folkesson@gmail.com>
Date:   Thu Aug 24 15:47:24 2023 +0200

    input: docs: pxrc: remove reference to phoenix-sim
    
    The reference undeniably points to something unrelated nowadays.
    Remove it.
    
    Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>

diff --git a/Documentation/input/devices/pxrc.rst b/Documentation/input/devices/pxrc.rst
index ca11f646bae8..5a86df4ad079 100644
--- a/Documentation/input/devices/pxrc.rst
+++ b/Documentation/input/devices/pxrc.rst
@@ -5,7 +5,7 @@ pxrc - PhoenixRC Flight Controller Adapter
 :Author: Marcus Folkesson <marcus.folkesson@gmail.com>
 
 This driver let you use your own RC controller plugged into the
-adapter that comes with PhoenixRC [1]_ or other compatible adapters.
+adapter that comes with PhoenixRC or other compatible adapters.
 
 The adapter supports 7 analog channels and 1 digital input switch.
 
@@ -41,7 +41,7 @@ Manual Testing
 ==============
 
 To test this driver's functionality you may use `input-event` which is part of
-the `input layer utilities` suite [2]_.
+the `input layer utilities` suite [1]_.
 
 For example::
 
@@ -53,5 +53,4 @@ To print all input events from input `devnr`.
 References
 ==========
 
-.. [1] http://www.phoenix-sim.com/
-.. [2] https://www.kraxel.org/cgit/input/
+.. [1] https://www.kraxel.org/cgit/input/

Verify that the patch looks good with scripts/checkpatch.pl:

$ ./scripts/checkpatch.pl --git HEAD
total: 0 errors, 0 warnings, 22 lines checked

Commit 00cf9f943529 ("input: docs: pxrc: remove reference to phoenix-sim") has no obvious style problems and is ready for submission.

Ok, the patch is ready to be sent out to the mailing list.

Collect all TO and CC from scripts/get_maintainer.pl with b4 prep --auto-to-cc:

$ b4 prep --auto-to-cc
Will collect To: addresses using get_maintainer.pl
Will collect Cc: addresses using get_maintainer.pl
Collecting To/Cc addresses
    + To: Dmitry Torokhov <dmitry.torokhov@gmail.com>
    + To: Jonathan Corbet <corbet@lwn.net>
    + Cc: linux-input@vger.kernel.org
    + Cc: linux-doc@vger.kernel.org
    + Cc: linux-kernel@vger.kernel.org
---
You can trim/expand this list with: b4 prep --edit-cover
Invoking git-filter-repo to update the cover letter.
New history written in 0.02 seconds...
Completely finished after 0.18 seconds.

Now we are ready to send the patch to mailing list by invoking the b4 send command.

b4 send will automatically use the [sendemail] section of your git config to determine which SMTP server to use.

I've configure git to use gmail as SMTP server, here is the relevant part of my ~/.gitconfig:

[sendemail]
  smtpserver = smtp.gmail.com
  smtpuser = marcus.folkesson@gmail.com
  smtpserverport = 587
  smtpencryption = tls
  smtpssl = true
        chainreplyto = false
        confirm = auto

b4 send will send the patches to the mailing list and prepare for version 2 of the patch series by increase the version number and create a new tag.

Other features

Compare between versions

As I only have one version of the patch (and there will probably not be a version 2), I have no use for the cool compare feature. However, b4 let you compare your versions by simply :

b4 prep --compare-to v1

Trailers

Going through all mail threads and collect all trailer tags could be a painstaking work. B4 will do this for you and put the tag into the right patch magically.

Unfortunately, I forgot to mention my friend in the original patch, so I sent another mail [7] with the Suggested-by: Mark Olsson <mark@markolsson.se> tag.

Fetch the tag with b4 trailers -S -u (-S because the tag was sent by me and not Mark):

$ b4 trailers -S -u
Calculating patch-ids from commits, this may take a moment...
Checking change-id "20230824-pxrc-doc-1addbaa2250f"
Grabbing search results from lore.kernel.org
---
  input: docs: pxrc: remove reference to phoenix-sim
    + Suggested-by: Mark Olsson <mark@markolsson.se>
---
Invoking git-filter-repo to update trailers.
New history written in 0.03 seconds...
Completely finished after 0.17 seconds.
Trailers updated.

As we can see the Suggested-by tag is now applied to the patch in the right place:

commit 650264be66f0e589cf67a49f769ddc7d51e076cb (HEAD -> b4/pxrc)
Author: Marcus Folkesson <marcus.folkesson@gmail.com>
Date:   Thu Aug 24 15:47:24 2023 +0200

    input: docs: pxrc: remove reference to phoenix-sim
    
    The reference undeniably points to something unrelated nowadays.
    Remove it.
    
    Suggested-by: Mark Olsson <mark@markolsson.se>
    Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>

Magic.

Further reading

The creator of the tool, Konstantin Ryabitsev, has an excellent youtube [8] video where he demonstrate the usage of b4.

Linux wireless regulatory domains

Linux wireless regulatory domains

I had a case where I had an embedded system that should act as a WiFi Access Point on the 5GHz band. The HW was capable and the system managed to act as a client to 5GHz networks, so everything looked good.

However, the system could not create an access point on some frequencies. How is it that? It is all about regulatory domains!

/media/tux-radio.png

Regulatory domains

Radio regulations is something that applies to all devices that make transmissions in the radio spectrum. The Linux kernel makes itself compliant to these regulations by make the regulatory restrictions as a directly part of the cfg80211 configuration API that all (new) wireless device drivers use.

Radio regulatory has not always been so tight integrated into the Linux kernel. This is a result to address the vendor concerns [6] about getting products based on Linux certified against all (geographically dependent) radio regulatory authorities out there.

Before that, all wireless drivers used to be a propritary blob that we load into our kernel, nowadays more and more vendors has a more FOSS driven development for such a drivers which is what we strive for.

To build trust with the chip vendors so that they could consider FOSS drivers as an real alternative, we stick to a number of principles.

Principles

There are a few principles [1] that the Linux kernel follows in order to fulfull the requirements for on use of the radio spectrum:

  • It should be reasonably impossible for a user to fail to comply with local regulations either unwittingly or by accident.
  • Default configurations should err in favor of more restrictive behavior in order to protect unwitting users.
  • Configurations that have no known compliant usage should not be part of the 'official' kernel tree.
  • Configurations that are compliant only under special circumstances (e.g. with special licenses) should not be part of the 'official' kernel tree. Any exceptions should have their legal requirements clearly marked and those options should never be configured by default.
  • Configurations that disable regulatory enforcement mechanisms should not be part of the 'official' kernel tree.
  • The kernel should rely on userland components to determine regulatory policy. Consequently, the kernel's regulatory enforcement mechanisms should be flexible enough to cover known or reasonably anticipated regulatory policies.
  • It is the moral duty of responsible distribution vendors, software developers, and community members to make every good faith effort to ensure proper compliance with applicable regulations regarding wireless communications.

The overall approach is "better safe than sorry" with respect to radio regulations. In other words, if no local configuration is setup, the system will fall back to the more restrictive world regulatory domain.

An example on such behaviour could be that the system is not allowed to initiate radio communication on certain radio frequencies that is not globally allowed.

Integration

CRDA

(Used pre Linux v4.15.)

CRDA [3], Central Regulatory Domain Agent, is a userspace agent responsible to read and intepret the regulatory.bin file and update the regulatory domains.

CRDA is intended to be trigged on uevents from the kernel (via udev) upon changes in the regulatory domain and setup the new regulations .

The udev rule to do this my look like this:

KERNEL=="regulatory*", ACTION=="change", SUBSYSTEM=="platform", RUN+="/sbin/crda"

Nowadays, CRDA is no longer needed as of kernel v.4.15 ( commit [2], "cfg80211: support loading regulatory database as firmware file"), the regulatory database is read by the Linux kernel directly as a firmware file during boot.

wireless-regdb

wireless-regdb [4] is the regulatory database used by Linux. The db.txt file in the repository contains regulatory information for each domain.

The output from this project is regulatory.db, which is loaded by the kernel as a firmware. The integrity of regulatory.db is typically ensured by the regulatory daemon by verifying the built-in RSA signature against a list of public keys in a preconfigured directory.

Although it is possible to build regulatory.db without any RSA key signature checking, it is highly recommended not to do so, as if the regulatory database is compromised in some way we could end up with a product that violates product radio compatibility.

wireless-regdb and Yocto

A side note for Yocto users.

The wireless-regdb recipe is part of oe-core [5] and should be included into your image if you intend to use any wireless LAN. wireless-regdb-static should be used with kernel >= v4.15 and wireless-regdb is intended to be used with CRDA.

In other words, add:

IMAGE_INSTALL:append = " wireless-regdb-static "

to your yocto distribution.

Hands on

iw [7] is the nl80211 based tool we use to configure wireless devices in Linux.

Here we will take a look at what the regulations may look like.

World regulatory domain

# iw reg get
global
country 00: DFS-UNSET
        (755 - 928 @ 2), (N/A, 20), (N/A), PASSIVE-SCAN
        (2402 - 2472 @ 40), (N/A, 20), (N/A)
        (2457 - 2482 @ 20), (N/A, 20), (N/A), AUTO-BW, PASSIVE-SCAN
        (2474 - 2494 @ 20), (N/A, 20), (N/A), NO-OFDM, PASSIVE-SCAN
        (5170 - 5250 @ 80), (N/A, 20), (N/A), AUTO-BW, PASSIVE-SCAN
        (5250 - 5330 @ 80), (N/A, 20), (0 ms), DFS, AUTO-BW, PASSIVE-SCAN
        (5490 - 5730 @ 160), (N/A, 20), (0 ms), DFS, PASSIVE-SCAN
        (5735 - 5835 @ 80), (N/A, 20), (N/A), PASSIVE-SCAN
        (57240 - 63720 @ 2160), (N/A, 0), (N/A)

Country 00 is the world regulatory domain. This could be a result of a system that failed to load the regulatory database.

Look into the output from dmesg for verify:

$ dmesg | grep cfg80211
[    3.268852] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[    3.269107] cfg80211: failed to load regulatory.db

As a result, this is the restrictions we have on the 5GHz band:

# iw list
[...]
                Frequencies:
                        * 5040 MHz [8] (disabled)
                        * 5060 MHz [12] (disabled)
                        * 5080 MHz [16] (disabled)
                        * 5170 MHz [34] (disabled)
                        * 5190 MHz [38] (20.0 dBm) (no IR)
                        * 5210 MHz [42] (20.0 dBm) (no IR)
                        * 5230 MHz [46] (20.0 dBm) (no IR)
                        * 5180 MHz [36] (20.0 dBm) (no IR)
                        * 5200 MHz [40] (20.0 dBm) (no IR)
                        * 5220 MHz [44] (20.0 dBm) (no IR)
                        * 5240 MHz [48] (20.0 dBm) (no IR)
                        * 5260 MHz [52] (20.0 dBm) (no IR, radar detection)
                        * 5280 MHz [56] (20.0 dBm) (no IR, radar detection)
                        * 5300 MHz [60] (20.0 dBm) (no IR, radar detection)
                        * 5320 MHz [64] (20.0 dBm) (no IR, radar detection)
                        * 5500 MHz [100] (20.0 dBm) (no IR, radar detection)
                        * 5520 MHz [104] (20.0 dBm) (no IR, radar detection)
                        * 5540 MHz [108] (20.0 dBm) (no IR, radar detection)
                        * 5560 MHz [112] (20.0 dBm) (no IR, radar detection)
                        * 5580 MHz [116] (20.0 dBm) (no IR, radar detection)
                        * 5600 MHz [120] (20.0 dBm) (no IR, radar detection)
                        * 5620 MHz [124] (20.0 dBm) (no IR, radar detection)
                        * 5640 MHz [128] (20.0 dBm) (no IR, radar detection)
                        * 5660 MHz [132] (20.0 dBm) (no IR, radar detection)
                        * 5680 MHz [136] (20.0 dBm) (no IR, radar detection)
                        * 5700 MHz [140] (20.0 dBm) (no IR, radar detection)
                        * 5745 MHz [149] (20.0 dBm) (no IR)
                        * 5765 MHz [153] (20.0 dBm) (no IR)
                        * 5785 MHz [157] (20.0 dBm) (no IR)
                        * 5805 MHz [161] (20.0 dBm) (no IR)
                        * 5825 MHz [165] (20.0 dBm) (no IR)
[...]

We can see the no IR flag is set for almost all frequencies on the 5GHz band. Please note that NO-IR is not the same as disabled, it simply means that we cannot initiate radiation on those frequencies.

Initiate radiation simply includes all modes of operations that require us to initiate radiation first. Think of acting as an Access Point, IBSS, Mesh or P2P master.

We can still use the frequency though, there is no problem to connect to an Access Point (we are not the one who initiate the radiation) on these frequencies.

Local regulatory domain

When a proper regulatory database is loaded into the system, we can setup the local regulatory domain instead of the globally one.

Set Swedish (SE) as our local regulatory domain:

# iw reg set SE
# iw reg get
global
country SE: DFS-ETSI
        (2400 - 2483 @ 40), (N/A, 20), (N/A)
        (5150 - 5250 @ 80), (N/A, 23), (N/A), NO-OUTDOOR, AUTO-BW
        (5250 - 5350 @ 80), (N/A, 20), (0 ms), NO-OUTDOOR, DFS, AUTO-BW
        (5470 - 5725 @ 160), (N/A, 26), (0 ms), DFS
        (5725 - 5875 @ 80), (N/A, 13), (N/A)
        (5945 - 6425 @ 160), (N/A, 23), (N/A), NO-OUTDOOR
        (57000 - 71000 @ 2160), (N/A, 40), (N/A)

And we are now allowed to use the 5GHz band with other restrictions:

#iw list
[...]
                Frequencies:
                        * 5040 MHz [8] (disabled)
                        * 5060 MHz [12] (disabled)
                        * 5080 MHz [16] (disabled)
                        * 5170 MHz [34] (23.0 dBm)
                        * 5190 MHz [38] (23.0 dBm)
                        * 5210 MHz [42] (23.0 dBm)
                        * 5230 MHz [46] (23.0 dBm)
                        * 5180 MHz [36] (23.0 dBm)
                        * 5200 MHz [40] (23.0 dBm)
                        * 5220 MHz [44] (23.0 dBm)
                        * 5240 MHz [48] (23.0 dBm)
                        * 5260 MHz [52] (20.0 dBm) (no IR, radar detection)
                        * 5280 MHz [56] (20.0 dBm) (no IR, radar detection)
                        * 5300 MHz [60] (20.0 dBm) (no IR, radar detection)
                        * 5320 MHz [64] (20.0 dBm) (no IR, radar detection)
                        * 5500 MHz [100] (26.0 dBm) (no IR, radar detection)
                        * 5520 MHz [104] (26.0 dBm) (no IR, radar detection)
                        * 5540 MHz [108] (26.0 dBm) (no IR, radar detection)
                        * 5560 MHz [112] (26.0 dBm) (no IR, radar detection)
                        * 5580 MHz [116] (26.0 dBm) (no IR, radar detection)
                        * 5600 MHz [120] (26.0 dBm) (no IR, radar detection)
                        * 5620 MHz [124] (26.0 dBm) (no IR, radar detection)
                        * 5640 MHz [128] (26.0 dBm) (no IR, radar detection)
                        * 5660 MHz [132] (26.0 dBm) (no IR, radar detection)
                        * 5680 MHz [136] (26.0 dBm) (no IR, radar detection)
                        * 5700 MHz [140] (26.0 dBm) (no IR, radar detection)
                        * 5745 MHz [149] (13.0 dBm)
                        * 5765 MHz [153] (13.0 dBm)
                        * 5785 MHz [157] (13.0 dBm)
                        * 5805 MHz [161] (13.0 dBm)
                        * 5825 MHz [165] (13.0 dBm)
[...]

Regulatory flags

Some of the flags reported by iw may not be obvious at a first glance. Here is an explaination for some of them:

Flag Meaning
  Can be used without restrictions.
disabled Disabled
NO-OUTDOOR MUST be used indoor only.
DFS MUST be used with DFS regardless indoor or outdoor.
SRD MUST comply with SRD requirements regardless indoor or outdoor.
NO-OUTDOOR/DFS MUST be used with DFS and indoor only.
NO-OUTDOOR/TPC MUST be used with TPC and indoor only.
DFS/TPC MUST be used with DFS and TPC.
DFS/TPC + SRD MUST be used with DFS, TPC and comply with SRD requirements.
  • DFS: stands for Dynamic Frequency Selection and is a channel allocation scheme used to prevent electromagnetic interference with systems that predates Wi-Fi.
  • TPC: stands for Transmit Power Control which is a mechanism to automatically reduce the used transmission output power when other networks are within range.
  • SRD: stands for Short-Range Device and cover devices that are low-power transmitters typically limited to the range of 24-100mW ERP.

Add support for MCP39XX in Linux kernel

Add support for MCP39XX in Linux kernel

I've maintained the MCP3911 driver in the Linux kernel for some time and continuously add support for new features [1] upon requests from people and companies.

Microchip has several IC:s in this serie of ADCs that works similiar to MCP3911. Actually, all other IC:s are register compatible but MCP3911. The IC:s I've extended support for is MCP3910, MCP3912, MCP3913, MCP3914, MCP3918 and MCP3919.

The main difference between these IC:s from the driver perspective is the number of channels ranging from 1 to 8 channels and that the register map is not the same for all devices.

/media/mcp39xx.png

Implementation

This is a rather small patch without any fanciness, but just to show how to do this without the macro-magic you find in Zephyr [2].

Add compatible strings

The Linux driver infrastructure binds a certain device to a driver by a string (or other uniq identifies such as VID/PID for USB). When, for example, the compatible property of a device tree node matches a device driver, a device is instantiated and the probe function is called.

As a single device driver could handle multiple similiar IC:s where some of their properites may differ, we have to differentiate those somehow. This is done by provide device specific data to each instance of the device. This data is called "driver data" or "private data" and is part of the device lookup table.

E.g. the driver_data field of the struct spi_device_id:

struct spi_device_id {
	char name[SPI_NAME_SIZE];
	kernel_ulong_t driver_data;	/* Data private to the driver */
};

Or the data field of the struct of_device_id:

/*
 * Struct used for matching a device
 */
struct of_device_id {
	char	name[32];
	char	type[32];
	char	compatible[128];
	const void *data;
};

For this driver, the driver data to these ID tables looks as follows:

static const struct of_device_id mcp3911_dt_ids[] = {
-       { .compatible = "microchip,mcp3911" },
+       { .compatible = "microchip,mcp3910", .data = &mcp3911_chip_info[MCP3910] },
+       { .compatible = "microchip,mcp3911", .data = &mcp3911_chip_info[MCP3911] },
+       { .compatible = "microchip,mcp3912", .data = &mcp3911_chip_info[MCP3912] },
+       { .compatible = "microchip,mcp3913", .data = &mcp3911_chip_info[MCP3913] },
+       { .compatible = "microchip,mcp3914", .data = &mcp3911_chip_info[MCP3914] },
+       { .compatible = "microchip,mcp3918", .data = &mcp3911_chip_info[MCP3918] },
+       { .compatible = "microchip,mcp3919", .data = &mcp3911_chip_info[MCP3919] },
    { }
};
MODULE_DEVICE_TABLE(of, mcp3911_dt_ids);

static const struct spi_device_id mcp3911_id[] = {
-       { "mcp3911", 0 },
+       { "mcp3910", (kernel_ulong_t)&mcp3911_chip_info[MCP3910] },
+       { "mcp3911", (kernel_ulong_t)&mcp3911_chip_info[MCP3911] },
+       { "mcp3912", (kernel_ulong_t)&mcp3911_chip_info[MCP3912] },
+       { "mcp3913", (kernel_ulong_t)&mcp3911_chip_info[MCP3913] },
+       { "mcp3914", (kernel_ulong_t)&mcp3911_chip_info[MCP3914] },
+       { "mcp3918", (kernel_ulong_t)&mcp3911_chip_info[MCP3918] },
+       { "mcp3919", (kernel_ulong_t)&mcp3911_chip_info[MCP3919] },
    { }

The driver data is then reachable in the probe function via spi_get_device_match_data():

    adc->chip = spi_get_device_match_data(spi);

Driver data

The driver data is used to distinguish between different devices and provide enough information to make it possible for the driver to handle all differencies between the IC:s in a common way.

The driver data for these devices looks as follows:

+struct mcp3911_chip_info {
+       const struct iio_chan_spec *channels;
+       unsigned int num_channels;
+
+       int (*config)(struct mcp3911 *adc);
+       int (*get_osr)(struct mcp3911 *adc, int *val);
+       int (*set_osr)(struct mcp3911 *adc, int val);
+       int (*get_offset)(struct mcp3911 *adc, int channel, int *val);
+       int (*set_offset)(struct mcp3911 *adc, int channel, int val);
+       int (*set_scale)(struct mcp3911 *adc, int channel, int val);
+};
+

Description of the structure members:

  • .channels is a a pointer to struct iio_chan_spec where all ADC and timestamp channels are specified.
  • .num_channels is the number of channels
  • .config is a function pointer to configure the device
  • .get_* and .set_* is function pointers used to get/set certain registers

A struct mcp3911_chip_info is created for each type of supported IC:

+static const struct mcp3911_chip_info mcp3911_chip_info[] = {
+       [MCP3910] = {
+               .channels = mcp3910_channels,
+               .num_channels = ARRAY_SIZE(mcp3910_channels),
+               .config = mcp3910_config,
+               .get_osr = mcp3910_get_osr,
+               .set_osr = mcp3910_set_osr,
+               .get_offset = mcp3910_get_offset,
+               .set_offset = mcp3910_set_offset,
+               .set_scale = mcp3910_set_scale,
+       },
+       [MCP3911] = {
+               .channels = mcp3911_channels,
+               .num_channels = ARRAY_SIZE(mcp3911_channels),
+               .config = mcp3911_config,
+               .get_osr = mcp3911_get_osr,
+               .set_osr = mcp3911_set_osr,
+               .get_offset = mcp3911_get_offset,
+               .set_offset = mcp3911_set_offset,
+               .set_scale = mcp3911_set_scale,
+       },
+       [MCP3912] = {
+               .channels = mcp3912_channels,
+               .num_channels = ARRAY_SIZE(mcp3912_channels),
+               .config = mcp3910_config,
+               .get_osr = mcp3910_get_osr,
+               .set_osr = mcp3910_set_osr,
+               .get_offset = mcp3910_get_offset,
+               .set_offset = mcp3910_set_offset,
+               .set_scale = mcp3910_set_scale,
+       },
+       [MCP3913] = {
+               .channels = mcp3913_channels,
+               .num_channels = ARRAY_SIZE(mcp3913_channels),
+               .config = mcp3910_config,
+               .get_osr = mcp3910_get_osr,
+               .set_osr = mcp3910_set_osr,
+               .get_offset = mcp3910_get_offset,
+               .set_offset = mcp3910_set_offset,
+               .set_scale = mcp3910_set_scale,
+       },
+       [MCP3914] = {
+               .channels = mcp3914_channels,
+               .num_channels = ARRAY_SIZE(mcp3914_channels),
+               .config = mcp3910_config,
+               .get_osr = mcp3910_get_osr,
+               .set_osr = mcp3910_set_osr,
+               .get_offset = mcp3910_get_offset,
+               .set_offset = mcp3910_set_offset,
+               .set_scale = mcp3910_set_scale,
+       },
+       [MCP3918] = {
+               .channels = mcp3918_channels,
+               .num_channels = ARRAY_SIZE(mcp3918_channels),
+               .config = mcp3910_config,
+               .get_osr = mcp3910_get_osr,
+               .set_osr = mcp3910_set_osr,
+               .get_offset = mcp3910_get_offset,
+               .set_offset = mcp3910_set_offset,
+               .set_scale = mcp3910_set_scale,
+       },
+       [MCP3919] = {
+               .channels = mcp3919_channels,
+               .num_channels = ARRAY_SIZE(mcp3919_channels),
+               .config = mcp3910_config,
+               .get_osr = mcp3910_get_osr,
+               .set_osr = mcp3910_set_osr,
+               .get_offset = mcp3910_get_offset,
+               .set_offset = mcp3910_set_offset,
+               .set_scale = mcp3910_set_scale,
+       },
+};

Thanks to this, all differences between the IC:s is in one place and the driver code is common for all devices. See the code below how oversampling ration is set. The differences between IC:s is handled by the callback function:

        case IIO_CHAN_INFO_OVERSAMPLING_RATIO:
                for (int i = 0; i < ARRAY_SIZE(mcp3911_osr_table); i++) {
                        if (val == mcp3911_osr_table[i]) {
-                               val = FIELD_PREP(MCP3911_CONFIG_OSR, i);
-                               ret = mcp3911_update(adc, MCP3911_REG_CONFIG, MCP3911_CONFIG_OSR,
-                                               val, 2);
+                               ret = adc->chip->set_osr(adc, i);
                                break;
                        }
                }