Bug in the iMX8MP ECSPI module?

Bug in the iMX8MP ECSPI module?

Background

I do have a system where I can swap between iMX8M Mini and iMX8M Plus CPU modules on the same carrier board.

I did write a a SPI driver for a device on the carrier board. The device is connected to the ECSPI1 (the CPU contains several ECSPI modules) and use the hardware chipselect 0 (SS0). The driver has been used with the iMX8MM CPU module for a while, but as soon I swapped to the iMX8MP it certainly stopped working.

Both iMX8MM and iMX8MP have the same ECSPI IP block that is managed by the spi-imx [1] Linux kernel driver, the application and root filesystem is the same as well.

Same driver, same application, different module. What is happening?

The driver layer also did not report anything suspicious, all SPI transactions contained the data I expected and was successfully sent out on the bus. After debugging the application, driver and devicetree for a while, I took a closer look on the actual SPI signals.

SPI signals

I'm not going to describe the SPI interface specifications, please see Wikipedia [2] or such for more details.

It turns out that the chip select goes inactive after each sent byte, which is a weird behavior. The chipselect should stay low during the whole data transaction.

Here is the signals of one transaction of two bytes:

/media/imx8mp-spi-ss0.jpg

The ECSPI modules supports dynamic burst size, so I was experimenting with that without any success.

Workaround

The best workaround I came up with was to MUX the chipselect pin to the GPIO function instead of SS0 and map that GPIO as chipselect to ECSPI1 by override the affected properties in the device tree file:

&ecspi1 {
          cs-gpios =
                      <&gpio5 9 GPIO_ACTIVE_LOW>,
                      <&gpio2 8 GPIO_ACTIVE_LOW>;
};

&pinctrl_ecspi1_cs0 {
        fsl,pins = <
                MX8MP_IOMUXC_ECSPI1_SS0__GPIO5_IO09         0x40000
                    >;
};

Then the signals looks better:

/media/imx8mp-spi-gpio.jpg

Conclusion

I do not know if all ECSPI modules with all HW chipselects is affected or only SS0 @ ECSPI1. I could not find anything about it in the iMX8MP Errata.

The fact that the workaround did work makes me suspect a hardware bug in the iMX8MP processor. I guess we will see if it shows up in the errata later on.

Capture a picture with V4L2

Capture a picture with V4L2

Brief

As we has seen before, cameras in Linux could be a complex [1] story and you have to watch every step you take to get it right. libcamera [2] does a great job to simplify this in an platform independent way and should be used whenever it is possible.

But not all cameras have a complex flow-chart. Some cameras (e.g. web cameras) are "self-contained" where the image data goes straight from the camera to the user application, without any detours through different IP blocks for image processing on its way.

/media/camera-sketch.png

The V4L2 framework is perfectly suited to those simple cameras.

When I searched around for a simple example application that explained the necessary steps to capture images from a camera, I simple could not find what I was looking for. This is my attempt to provide what I failed to find.

V4L2 user space API

Video devices is represented by character devices in a Linux system. The devices shows up as /dev/video* and supports the following operations:

  • open() - Open a video device
  • close() - Close a video device
  • ioctl() - Send ioctl commands to the device
  • mmap() - Map memory to a driver allocated buffer
  • read() - Read from video device
  • write() - Write to the device

The V4L2 API basically relies on a very large set of IOCTL commands to configure properties and behavior of the camera. The whole API is available from the following header:

#include <linux/videodev2.h>

Here is a list of the most common IOCTL commands:

  • VIDIOC_QUERYCAP - Query a list of the supported capabilities. Always query the capabilities to ensure that the camera supports the buffer mode you intend to use.
  • VIDIOC_ENUM_FMT - Enumerate supported image formats.
  • VIDIOC_G_FMT - Get the current image format.
  • VIDIOC_S_FMT - Set a new image format.
  • VIDIOC_REQBUFS - Request a number of buffers that can later be memory mapped by the user application. The application should always check the actual number that are granted as the driver may allocate mor or less than the requested.
  • VIDIOC_QUERYBUF - Get buffer information for those buffers earlier requested by VIDIOC_REQBUFS. The information could then be passed to the mmap() system call in order to map that buffer to user space.
  • VIDIOC_QBUF - Queue one of the requested buffers to make it available for the driver to fill with image data. Once the buffer is filled, it is no longer available for new data and should be dequeued by the user.
  • VIDEOC_DQBUF - Dequeue a filled buffer. The command will block if no buffer is available unless O_NONBLOCK was passed to open().
  • VIDIOC_STREAMON - Turn on streaming. Queued buffers will be filled as soon data is available.
  • VIDIOC_STREAMOFF - Turn off streaming. This command also flushes the buffer queue.

Buffer management

The V4L2 core maintain two buffer queues internally; one queue (referred to as IN) for incoming (camera->driver) image data and one (referred to as OUT) for outgoing (driver->user) image data.

Buffers are put into the IN queue via the VIDIOC_QBUF command. Once the buffer is filled, the buffer is dequeued from IN and put into the OUT queue, which where the data is available for to the user.

Whenever the user want to dequeue a buffer with VIDIOC_DQBUF, and a buffer is available, it is taken from the OUT queue and pushed to the user application. If no buffer is available the dequeue operation will wait until a buffer is filled and available unless the file descriptor is opened with O_NONBLOCK.

Video data can be pushed to userspace in a few different ways:

  • Read I/O - simply perform a read() operation and do not mess with buffers
  • User pointer - The user application allocates buffers and provide to driver
  • DMA buf - Mostly used for mem2mem devices
  • mmap - Let driver allocate buffers and mmap(2) these to userspace.

This post will *only* focus on mmap:ed buffers!

Typical workflow

We will follow these steps in order to acquire frames from the camera:

/media/v4l2-workflow.png

Query capabilities

VIDIOC_QUERYCAP is used to query the supported capabilities. What is most interesting is to verify that it supports the mode (V4L2_CAP_STREAMING) we want to work with. It is also a good manners to verify that it actually is a capture device (V4L2_CAP_VIDEO_CAPTURE) we have opened and nothing else.

The V4L2 API uses a struct v4l2_capability that is passed to the IOCTL. This structure is defined as follows:

/**
  * struct v4l2_capability - Describes V4L2 device caps returned by VIDIOC_QUERYCAP
  *
  * @driver:           name of the driver module (e.g. "bttv")
  * @card:     name of the card (e.g. "Hauppauge WinTV")
  * @bus_info:         name of the bus (e.g. "PCI:" + pci_name(pci_dev) )
  * @version:          KERNEL_VERSION
  * @capabilities: capabilities of the physical device as a whole
  * @device_caps:  capabilities accessed via this particular device (node)
  * @reserved:         reserved fields for future extensions
  */
struct v4l2_capability {
    __u8    driver[16];
    __u8    card[32];
    __u8    bus_info[32];
    __u32   version;
    __u32   capabilities;
    __u32   device_caps;
    __u32   reserved[3];
};

The v4l2_capability.capabilities field is decoded as follows:

/* Values for 'capabilities' field */
#define V4L2_CAP_VIDEO_CAPTURE              0x00000001  /* Is a video capture device */
#define V4L2_CAP_VIDEO_OUTPUT               0x00000002  /* Is a video output device */
#define V4L2_CAP_VIDEO_OVERLAY              0x00000004  /* Can do video overlay */
#define V4L2_CAP_VBI_CAPTURE                0x00000010  /* Is a raw VBI capture device */
#define V4L2_CAP_VBI_OUTPUT         0x00000020  /* Is a raw VBI output device */
#define V4L2_CAP_SLICED_VBI_CAPTURE 0x00000040  /* Is a sliced VBI capture device */
#define V4L2_CAP_SLICED_VBI_OUTPUT  0x00000080  /* Is a sliced VBI output device */
#define V4L2_CAP_RDS_CAPTURE                0x00000100  /* RDS data capture */
#define V4L2_CAP_VIDEO_OUTPUT_OVERLAY       0x00000200  /* Can do video output overlay */
#define V4L2_CAP_HW_FREQ_SEEK               0x00000400  /* Can do hardware frequency seek  */
#define V4L2_CAP_RDS_OUTPUT         0x00000800  /* Is an RDS encoder */

/* Is a video capture device that supports multiplanar formats */
#define V4L2_CAP_VIDEO_CAPTURE_MPLANE       0x00001000
/* Is a video output device that supports multiplanar formats */
#define V4L2_CAP_VIDEO_OUTPUT_MPLANE        0x00002000
/* Is a video mem-to-mem device that supports multiplanar formats */
#define V4L2_CAP_VIDEO_M2M_MPLANE   0x00004000
/* Is a video mem-to-mem device */
#define V4L2_CAP_VIDEO_M2M          0x00008000

#define V4L2_CAP_TUNER                      0x00010000  /* has a tuner */
#define V4L2_CAP_AUDIO                      0x00020000  /* has audio support */
#define V4L2_CAP_RADIO                      0x00040000  /* is a radio device */
#define V4L2_CAP_MODULATOR          0x00080000  /* has a modulator */

#define V4L2_CAP_SDR_CAPTURE                0x00100000  /* Is a SDR capture device */
#define V4L2_CAP_EXT_PIX_FORMAT             0x00200000  /* Supports the extended pixel format */
#define V4L2_CAP_SDR_OUTPUT         0x00400000  /* Is a SDR output device */
#define V4L2_CAP_META_CAPTURE               0x00800000  /* Is a metadata capture device */

#define V4L2_CAP_READWRITE              0x01000000  /* read/write systemcalls */
#define V4L2_CAP_STREAMING              0x04000000  /* streaming I/O ioctls */
#define V4L2_CAP_META_OUTPUT                0x08000000  /* Is a metadata output device */

#define V4L2_CAP_TOUCH                  0x10000000  /* Is a touch device */

#define V4L2_CAP_IO_MC                      0x20000000  /* Is input/output controlled by the media controller */

#define V4L2_CAP_DEVICE_CAPS            0x80000000  /* sets device capabilities field */

Example code on how to use VIDIOC_QUERYCAP:

void query_capabilites(int fd)
{
    struct v4l2_capability cap;

    if (-1 == ioctl(fd, VIDIOC_QUERYCAP, &cap)) {
        perror("Query capabilites");
        exit(EXIT_FAILURE);
    }

    if (!(cap.capabilities & V4L2_CAP_VIDEO_CAPTURE)) {
        fprintf(stderr, "Device is no video capture device\\n");
        exit(EXIT_FAILURE);
    }

    if (!(cap.capabilities & V4L2_CAP_READWRITE)) {
        fprintf(stderr, "Device does not support read i/o\\n");
    }

    if (!(cap.capabilities & V4L2_CAP_STREAMING)) {
        fprintf(stderr, "Devices does not support streaming i/o\\n");
    }
}

Capabilities could also be read out with v4l2-ctl:

marcus@goliat:~$ v4l2-ctl -d /dev/video4  --info
Driver Info:
    Driver name      : uvcvideo
    Card type        : USB 2.0 Camera: USB Camera
    Bus info         : usb-0000:00:14.0-8.3.1.1
    Driver version   : 6.0.8
    Capabilities     : 0x84a00001
        Video Capture
        Metadata Capture
        Streaming
        Extended Pix Format
        Device Capabilities
    Device Caps      : 0x04200001
        Video Capture
        Streaming
        Extended Pix Format

Set format

The next step after we know for sure that the device is a capture device and supports the certain mode we want to use, is to setup the video format. The application could otherwise receive video frames in a format that it could not deal with.

Supported formats can be quarried with VIDIOC_ENUM_FMT and the current video format can be read out with VIDIOC_G_FMT.

Current format could be fetched by v4l2-ctl:

marcus@goliat:~$ v4l2-ctl -d /dev/video4  --get-fmt-video
Format Video Capture:
    Width/Height      : 320/240
    Pixel Format      : 'YUYV' (YUYV 4:2:2)
    Field             : None
    Bytes per Line    : 640
    Size Image        : 153600
    Colorspace        : sRGB
    Transfer Function : Rec. 709
    YCbCr/HSV Encoding: ITU-R 601
    Quantization      : Default (maps to Limited Range)
    Flags             :

The v4l2_format struct is defined as follows:

/**
 * struct v4l2_format - stream data format
 * @type:   enum v4l2_buf_type; type of the data stream
 * @pix:    definition of an image format
 * @pix_mp: definition of a multiplanar image format
 * @win:    definition of an overlaid image
 * @vbi:    raw VBI capture or output parameters
 * @sliced: sliced VBI capture or output parameters
 * @raw_data:       placeholder for future extensions and custom formats
 * @fmt:    union of @pix, @pix_mp, @win, @vbi, @sliced, @sdr, @meta
 *          and @raw_data
 */
struct v4l2_format {
    __u32    type;
    union {
        struct v4l2_pix_format              pix;     /* V4L2_BUF_TYPE_VIDEO_CAPTURE */
        struct v4l2_pix_format_mplane       pix_mp;  /* V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE */
        struct v4l2_window          win;     /* V4L2_BUF_TYPE_VIDEO_OVERLAY */
        struct v4l2_vbi_format              vbi;     /* V4L2_BUF_TYPE_VBI_CAPTURE */
        struct v4l2_sliced_vbi_format       sliced;  /* V4L2_BUF_TYPE_SLICED_VBI_CAPTURE */
        struct v4l2_sdr_format              sdr;     /* V4L2_BUF_TYPE_SDR_CAPTURE */
        struct v4l2_meta_format             meta;    /* V4L2_BUF_TYPE_META_CAPTURE */
        __u8        raw_data[200];                   /* user-defined */
    } fmt;
};

To set you have to set the v4l2_format.type field to the relevant format.

Example code on how to use VIDIOC_S_FMT:

int set_format(int fd) {
    struct v4l2_format format = {0};
    format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    format.fmt.pix.width = 320;
    format.fmt.pix.height = 240;
    format.fmt.pix.pixelformat = V4L2_PIX_FMT_YUYV;
    format.fmt.pix.field = V4L2_FIELD_NONE;
    int res = ioctl(fd, VIDIOC_S_FMT, &format);
    if(res == -1) {
        perror("Could not set format");
        exit(1);
    }
    return res;
}

Request buffers

Next step once we are done with the format preparations we should allocate buffers to have somewhere to store the images.

This is exactly what VIDIOC_REQBUFS ioctl does for you. The command does take a struct v4l2_requestbuffers as argument:

struct v4l2_requestbuffers {
    __u32                   count;
    __u32                   type;           /* enum v4l2_buf_type */
    __u32                   memory;         /* enum v4l2_memory */
    __u32                   capabilities;
    __u8                    flags;
    __u8                    reserved[3];
};

Some of these fields must be populated before we can use it:

  • v4l2_requestbuffers.count - Should be set to the number of memory buffers that should be allocated. It is important to set a number high enough so that frames won't be dropped due to lack of queued buffers. The driver is the one who decides what the minimum number is. The application should always check the return value of this field as the driver could grant a bigger number of buffers than then application actually requested.
  • v4l2_requestbuffers.type - As we are going to use a camera device, set this to V4L2_BUF_TYPE_VIDEO_CAPTURE.
  • v4l2_requestbuffers.memory - Set the streaming method. Available values are V4L2_MEMORY_MMAP, V4L2_MEMORY_USERPTR and V4L2_MEMORY_DMABUF.

Example code on how to use VIDIOC_REQBUF:

int request_buffer(int fd, int count) {
    struct v4l2_requestbuffers req = {0};
    req.count = count;
    req.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    req.memory = V4L2_MEMORY_MMAP;
    if (-1 == ioctl(fd, VIDIOC_REQBUFS, &req))
    {
        perror("Requesting Buffer");
        exit(1);
    }
    return req.count;
}

Query buffer

After the buffers are allocated by the kernel, we have to query the physical address of each allocated buffer in order to mmap() those.

The VIDIOC_QUERYBUF ioctl works with the struct v4l2_buffer:

/**
 * struct v4l2_buffer - video buffer info
 * @index:  id number of the buffer
 * @type:   enum v4l2_buf_type; buffer type (type == *_MPLANE for
 *          multiplanar buffers);
 * @bytesused:      number of bytes occupied by data in the buffer (payload);
 *          unused (set to 0) for multiplanar buffers
 * @flags:  buffer informational flags
 * @field:  enum v4l2_field; field order of the image in the buffer
 * @timestamp:      frame timestamp
 * @timecode:       frame timecode
 * @sequence:       sequence count of this frame
 * @memory: enum v4l2_memory; the method, in which the actual video data is
 *          passed
 * @offset: for non-multiplanar buffers with memory == V4L2_MEMORY_MMAP;
 *          offset from the start of the device memory for this plane,
 *          (or a "cookie" that should be passed to mmap() as offset)
 * @userptr:        for non-multiplanar buffers with memory == V4L2_MEMORY_USERPTR;
 *          a userspace pointer pointing to this buffer
 * @fd:             for non-multiplanar buffers with memory == V4L2_MEMORY_DMABUF;
 *          a userspace file descriptor associated with this buffer
 * @planes: for multiplanar buffers; userspace pointer to the array of plane
 *          info structs for this buffer
 * @m:              union of @offset, @userptr, @planes and @fd
 * @length: size in bytes of the buffer (NOT its payload) for single-plane
 *          buffers (when type != *_MPLANE); number of elements in the
 *          planes array for multi-plane buffers
 * @reserved2:      drivers and applications must zero this field
 * @request_fd: fd of the request that this buffer should use
 * @reserved:       for backwards compatibility with applications that do not know
 *          about @request_fd
 *
 * Contains data exchanged by application and driver using one of the Streaming
 * I/O methods.
 */
struct v4l2_buffer {
    __u32                   index;
    __u32                   type;
    __u32                   bytesused;
    __u32                   flags;
    __u32                   field;
    struct timeval          timestamp;
    struct v4l2_timecode    timecode;
    __u32                   sequence;

    /* memory location */
    __u32                   memory;
    union {
        __u32           offset;
        unsigned long   userptr;
        struct v4l2_plane *planes;
        __s32               fd;
    } m;
    __u32                   length;
    __u32                   reserved2;
    union {
        __s32               request_fd;
        __u32               reserved;
    };
};

The structure contains a lot of fields, but in our mmap() example, we only need to fill out a few:

  • v4l2_buffer.type - Buffer type, we use V4L2_BUF_TYPE_VIDEO_CAPTURE.
  • v4l2_buffer.memory - Memory method, still go for V4L2_MEMORY_MMAP.
  • v4l2_buffer.index - As we probably have requested multiple buffers and want to mmap each of them we have to distinguish the buffers somehow. The index field is buffer id reaching from 0 to v4l2_requestbuffers.count.

Example code on how to use VIDIOC_QUERYBUF:

int query_buffer(int fd, int index, unsigned char **buffer) {
    struct v4l2_buffer buf = {0};
    buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    buf.memory = V4L2_MEMORY_MMAP;
    buf.index = index;
    int res = ioctl(fd, VIDIOC_QUERYBUF, &buf);
    if(res == -1) {
        perror("Could not query buffer");
        return 2;
    }


    *buffer = (u_int8_t*)mmap (NULL, buf.length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, buf.m.offset);
    return buf.length;
}

Queue buffers

Before the buffers can be filled with data, the buffers has to be enqueued. Enqueued buffers will lock the memory pages used so that those cannot be swapped out during usage. The buffers remain locked until that are dequeued, the device is closed or streaming is turned off.

VIDIOC_QBUF takes the same argument as VIDIOC_QUERYBUF and has to be populated the same way.

Example code on how to use VIDIOC_QBUF:

int queue_buffer(int fd, int index) {
    struct v4l2_buffer bufd = {0};
    bufd.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    bufd.memory = V4L2_MEMORY_MMAP;
    bufd.index = index;
    if(-1 == ioctl(fd, VIDIOC_QBUF, &bufd))
    {
        perror("Queue Buffer");
        return 1;
    }
    return bufd.bytesused;
}

Start stream

Finally all preparations is done and we are up to start the stream! VIDIOC_STREAMON is basically informing the v4l layer that it can start acquire video frames and use the queued buffers to store them.

Example code on how to use VIDIOC_STREAMON:

int start_streaming(int fd) {
    unsigned int type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    if(ioctl(fd, VIDIOC_STREAMON, &type) == -1){
        perror("VIDIOC_STREAMON");
        exit(1);
    }
}

Dequeue buffer

Once buffers are filled with video data, those are ready to be dequeued and consumed by the application. This ioctl will be blocking (unless O_NONBLOCK is used) until a buffer is available.

As soon the buffer is dequeued and processed, the application has to immediately queue back the buffer so that the driver layer can fill it with new frames. This is usually part of the application main-loop.

VIDIOC_DQBUF works similar to VIDIOC_QBUF but it populates the v4l2_buffer.index field with the index number of the buffer that has been dequeued.

Example code on how to use VIDIOC_DQBUF:

int dequeue_buffer(int fd) {
    struct v4l2_buffer bufd = {0};
    bufd.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    bufd.memory = V4L2_MEMORY_MMAP;
    if(-1 == ioctl(fd, VIDIOC_DQBUF, &bufd))
    {
        perror("DeQueue Buffer");
        return 1;
    }
    return bufd.index;
}

Stop stream

Once we are done with the video capturing, we can stop the streaming. This will unlock all enqueued buffers and stop capture frames.

Example code on how to use VIDIOC_STREAMOFF:

int stop_streaming(int fd) {
    unsigned int type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    if(ioctl(fd, VIDIOC_STREAMOFF, &type) == -1){
        perror("VIDIOC_STREAMON");
        exit(1);
    }
}

Full example

It is not the most beautiful example, but it is at least something to work with.

#include <stdio.h>
#include <stdlib.h>

#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <linux/videodev2.h>

#define NBUF 3

void query_capabilites(int fd)
{
    struct v4l2_capability cap;

    if (-1 == ioctl(fd, VIDIOC_QUERYCAP, &cap)) {
        perror("Query capabilites");
        exit(EXIT_FAILURE);
    }

    if (!(cap.capabilities & V4L2_CAP_VIDEO_CAPTURE)) {
        fprintf(stderr, "Device is no video capture device\\n");
        exit(EXIT_FAILURE);
    }

    if (!(cap.capabilities & V4L2_CAP_READWRITE)) {
        fprintf(stderr, "Device does not support read i/o\\n");
    }

    if (!(cap.capabilities & V4L2_CAP_STREAMING)) {
        fprintf(stderr, "Devices does not support streaming i/o\\n");
        exit(EXIT_FAILURE);
    }
}

int queue_buffer(int fd, int index) {
    struct v4l2_buffer bufd = {0};
    bufd.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    bufd.memory = V4L2_MEMORY_MMAP;
    bufd.index = index;
    if(-1 == ioctl(fd, VIDIOC_QBUF, &bufd))
    {
        perror("Queue Buffer");
        return 1;
    }
    return bufd.bytesused;
}
int dequeue_buffer(int fd) {
    struct v4l2_buffer bufd = {0};
    bufd.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    bufd.memory = V4L2_MEMORY_MMAP;
    bufd.index = 0;
    if(-1 == ioctl(fd, VIDIOC_DQBUF, &bufd))
    {
        perror("DeQueue Buffer");
        return 1;
    }
    return bufd.index;
}


int start_streaming(int fd) {
    unsigned int type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    if(ioctl(fd, VIDIOC_STREAMON, &type) == -1){
        perror("VIDIOC_STREAMON");
        exit(EXIT_FAILURE);
    }
}

int stop_streaming(int fd) {
    unsigned int type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    if(ioctl(fd, VIDIOC_STREAMOFF, &type) == -1){
        perror("VIDIOC_STREAMON");
        exit(EXIT_FAILURE);
    }
}

int query_buffer(int fd, int index, unsigned char **buffer) {
    struct v4l2_buffer buf = {0};
    buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    buf.memory = V4L2_MEMORY_MMAP;
    buf.index = index;
    int res = ioctl(fd, VIDIOC_QUERYBUF, &buf);
    if(res == -1) {
        perror("Could not query buffer");
        return 2;
    }


    *buffer = (u_int8_t*)mmap (NULL, buf.length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, buf.m.offset);
    return buf.length;
}

int request_buffer(int fd, int count) {
    struct v4l2_requestbuffers req = {0};
    req.count = count;
    req.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    req.memory = V4L2_MEMORY_MMAP;
    if (-1 == ioctl(fd, VIDIOC_REQBUFS, &req))
    {
        perror("Requesting Buffer");
        exit(EXIT_FAILURE);
    }
    return req.count;
}

int set_format(int fd) {
    struct v4l2_format format = {0};
    format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    format.fmt.pix.width = 320;
    format.fmt.pix.height = 240;
    format.fmt.pix.pixelformat = V4L2_PIX_FMT_YUYV;
    format.fmt.pix.field = V4L2_FIELD_NONE;
    int res = ioctl(fd, VIDIOC_S_FMT, &format);
    if(res == -1) {
        perror("Could not set format");
        exit(EXIT_FAILURE);
    }
    return res;
}

int main() {
    unsigned char *buffer[NBUF];
    int fd = open("/dev/video4", O_RDWR);
    int size;
    int index;
    int nbufs;

    query_capabilites(fd);
    set_format(fd);
    nbufs = request_buffer(fd, NBUF);
    if ( nbufs > NBUF) {
        fprintf(stderr, "Increase NBUF to at least %i\n", nbufs);
        exit(1);

    }

    for (int i = 0; i < NBUF; i++) {

        /* Assume all sizes is equal.. */
        size = query_buffer(fd, 0, &buffer[0]);

        queue_buffer(fd, i);
    }

    start_streaming(fd);
    fd_set fds;
    FD_ZERO(&fds);
    FD_SET(fd, &fds);
    struct timeval tv = {0};
    tv.tv_sec = 2;
    int r = select(fd+1, &fds, NULL, NULL, &tv);
    if(-1 == r){
        perror("Waiting for Frame");
        exit(1);
    }

    index = dequeue_buffer(fd);
    int file = open("output.raw", O_RDWR | O_CREAT, 0666);
    fprintf(stderr, "file == %i\n", file);
    write(file, buffer[index], size);

    stop_streaming(fd);

    close(file);
    close(fd);

    return 0;
}

Route traffic with NAT

Route traffic with NAT

Long time ago I wrota a blog post [1] about how to use NAT to route traffic to your embedded device via your host computer.

Back then we were using iptables to achieve it, nowadays nftables is the preferred successor, so it is time for an update.

What is NAT anyway?

/media/nat.png

Network Address Translation, or NAT, does map an address space into another by modifying the network address infromation in the IP header for each packet. This is how your router is able to route your local network out to internet.

To share an internet connection this way may sometimes be very practical when working with embedded devices. The network may have restrictions/authentications that stops you from plug in your device directly to a network, your traffic must go via a VPN connection that you host has configured, your device only has an USB interface available... use cases are many.

If your device does not have ethernet nor WiFi but USB with OTG support, you can still share internet by setup a RNDIS gadget device.

Setup

Host setup

  • eth0 has the IP address 192.168.1.50 and is connected to the internet
  • usb0 has IP address 10.2.234.1 and is connected to the target device via RNDIS

The best way to configure nftables is to do it by script. We will setup two rules;

  • A NAT chain for masquerade packages and
  • A forward rule to route packages between usb0 and eth0
#!/usr/sbin/nft -f

table ip imx8_table {
        chain imx8_nat {
                type nat hook postrouting priority 0; policy accept;
                oifname "eth0" masquerade
        }

        chain imx8_forward {
                type filter hook forward priority 0; policy accept;
                iifname "usb0" oifname "eth0" accept
        }
}

We also have to enable IP forwarding. This could be done in several ways:

  • Via sysctl on command line:

    sudo sysctl -w net.ipv4.ip_forward=1
    
  • Via sysctl configuration file:

    echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.conf
    sudo /sbin/sysctl -p
    
  • Via procfs:

    echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
    

Target setup

  • usb0 has ip address 10.2.234.100 and is connected to the host.

You only need to make sure that all traffic is routed via usb0 by setting up a default route:

route add default gw 10.2.234.1 usb0

That is all. You should now be able to route your traffic out to the internet:

ping www.google.se

PING www.google.se (216.58.211.3) 56(84) bytes of data.
64 bytes from muc03s13-in-f3.1e100.net (216.58.211.3): icmp_seq=1 ttl=57 time=12.4 ms
64 bytes from muc03s13-in-f3.1e100.net (216.58.211.3): icmp_seq=2 ttl=57 time=12.4 ms
64 bytes from muc03s13-in-f3.1e100.net (216.58.211.3): icmp_seq=3 ttl=57 time=12.5 ms
64 bytes from muc03s13-in-f3.1e100.net (216.58.211.3): icmp_seq=4 ttl=57 time=12.5 ms

Contiguous Memory Allocator

Contiguous Memory Allocator

Introduction

I do find memory management as one of the most fascinating subsystem in the Linux kernel, and I take every chance I see to talk about it. This post is inspired by a project I'm currently working on; an embedded Linux platform with a camera connected to the CSI-2 bus.

Before we dig into which problems we could trip over, lets talk briefly about how the kernel handles memory.

Memory subsystem

The memory management subsystem handles a wide spectrum of operations which all have impact on the system performance. The subsystem is therefor divided into several parts to sustain operational efficiency and optimized resource handling for different use cases.

Such parts includes:

  • Page allocator
  • Buddy system
  • Kmalloc allocator
  • Slab caches
  • Vmalloc allocator
  • Contiguous memory allocator
  • ...

The smallest allocation unit of memory is a page frame. The Memory Management Unit (MMU) does a terrific job to arrange and map these page frames of the available physical memory into a virtual address space. Most allocations in the kernel are only virtually contiguous which is fine for the most use cases.

Some hardware/IP-blocks requires physically contiguous memory to work though. Direct Memory Access (DMA) transfers are one such case where memory (often) needs to be physically contiguous. Many DMA controllers now supports scatter-gather, which let you hand-pick addresses to make it appear to be contiguous and then let the (IO)MMU do the rest.

To make it works, it requires that the hardware/IP-blocks actually do its memory accesses through the MMU, which is not always the case.

Multimedia devices such as GPU or VPU does often requires huge blocks of physically contiguous memory and do (with exceptions, see Raspberry Pi 4 below) not make use of the (IO)MMU.

Contiguous memory

In order to meet this requirement on big chunks of physically contiguous memory we have to reserve it from the main memory during system boot.

Before CMA, we had to use the mem kernel parameter to limit how much of the system memory that should be available for allocators in the Linux system.

The memory outside this mem-region is not touched by the system and could be remapped into linear address space by the driver.

Here is the documentation for the mem kernel parameter [1]:

mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
                Amount of memory to be used in cases as follows:

                1 for test;
                2 when the kernel is not able to see the whole
                system memory;
                3 memory that lies after 'mem=' boundary is
                excluded from the hypervisor, then
                assigned to KVM guests.
                4 to limit the memory available for kdump kernel.

                [ARC,MICROBLAZE] - the limit applies only to low memory,
                high memory is not affected.

                [ARM64] - only limits memory covered by the linear
                mapping. The NOMAP regions are not affected.

                [X86] Work as limiting max address. Use together
                with memmap= to avoid physical address space collisions.
                Without memmap= PCI devices could be placed at addresses
                belonging to unused RAM.

                Note that this only takes effects during boot time since
                in above case 3, memory may need be hot added after boot
                if system memory of hypervisor is not sufficient.

The mem parameter has a few drawbacks. The driver needs details about where to get the reserved memory and the memory lie momentarily unused when the driver is not initiating any access operations.

Therefor the Contiguous Memory Allocator (CMA) was introduced to manage these reserved memory areas.

The benefits by using CMA is that this area is handled by the allocator algorithms instead of the device driver itself. This let both devices and systems to allocate and use memory from this CMA area through the page allocator for regular needs and through the DMA allocation routines when DMA capabilities is needed.

A few words about Raspberry Pi

Raspberry Pi uses a configuration (config.txt) file that is read by the GPU to initialize the system. The configuration file has many tweakable parameters and one of those are gpu_mem.

This parameter specifies how much memory (in megabytes) to reserve exclusively for the GPU. This works pretty much like the mem kernel commandline parameter described above, with the very same drawbacks. The memory reserved for GPU is not available for the ARM CPU and should be kept as low as possible that your application could work with.

One big difference between the variants of the Raspberry Pi modules is that the Raspberry Pi 4 has a GPU with its own MMU, which allows the GPU to use memory that is dynamically allocated within Linux. The gpu_mem could therfor be kept small on that platform.

The GPU is normally used for displays, 3D calculations, codecs and cameras. One important thing regarding the camera is that the default camera stack (libcamera) does use CMA memory to allocate buffers instead of the reserved GPU memory. In cases that the GPU is only for camera purposes, the gpu_mem could be kept small.

How much CMA is already reserved?

The easiest way to determine how much memory that is reserved for CMA is to consult meminfo:

# grep Cma /proc/meminfo
CmaTotal:         983040 kB
CmaFree:          612068 kB

or look at the boot log:

# dmesg | grep CMA
[    0.000000] Reserved memory: created CMA memory pool at 0x0000000056000000, size 960 MiB

Reserve memory with CMA

/media/reserved.jpg

The CMA area is reserved during boot and there are a few ways to do this.

By device tree

This is the preferred way to define CMA areas.

This example is taken from the device tree bindings documentation [2]:

reserved-memory {
    #address-cells = <1>;
    #size-cells = <1>;
    ranges;

    /* global autoconfigured region for contiguous allocations */
    linux,cma {
        compatible = "shared-dma-pool";
        reusable;
        size = <0x4000000>;
        alignment = <0x2000>;
        linux,cma-default;
    };
};

By kernel command line

The CMA area size could also be specified by the kernel command line. There are tons of references out there that states that the command line parameter is overridden by the device tree, but I thought it sounded weird so I looked it up, and the kernel command line overrides device tree, not the other way around.

At least nowadays:

static int __init rmem_cma_setup(struct reserved_mem *rmem)
{
    ...
    if (size_cmdline != -1 && default_cma) {
        pr_info("Reserved memory: bypass %s node, using cmdline CMA params instead\n",
            rmem->name);
        return -EBUSY;
    }
    ...
}

Here is the documentation for the cma kernel parameter [1]:

cma=nn[MG]@[start[MG][-end[MG]]]
                [KNL,CMA]
                Sets the size of kernel global memory area for
                contiguous memory allocations and optionally the
                placement constraint by the physical address range of
                memory allocations. A value of 0 disables CMA
                altogether. For more information, see
                kernel/dma/contiguous.c

By kernel configuration

The kernel configuration could be used to set min/max and even a percentage of how much of the available memory that should be reserved for the CMA area:

CONFIG_CMA
CONFIG_CMA_AREAS
CONFIG_DMA_CMA
CONFIG_DMA_PERNUMA_CMA
CONFIG_CMA_SIZE_MBYTES
CONFIG_CMA_SIZE_SEL_MBYTES
CONFIG_CMA_SIZE_SEL_PERCENTAGE
CONFIG_CMA_SIZE_SEL_MIN
CONFIG_CMA_SIZE_SEL_MAX
CONFIG_CMA_ALIGNMENT

Conclusion

As soon we are using camera devices with higher resolution and do the image manipulation in the VPU/GPU, we almost always have to increase the CMA area size. Otherwise we will end up with errors like this:

cma_alloc: alloc failed, req-size: 8192 pages, ret: -12

Use custom EDID in Linux

Use custom EDID in Linux

Extended Display Identification Data (EDID) is a metadata format for display devices to describe their capabilities such as resolution, display size, timing, bit depth and update frequency. It is a 128-byte (EDID) or 256-byte (Enhanced-EDID) structure transferred from the display device over the Display Data Channel (DDC) protocol, which is a layer on top of the I2C specification.

The EDID is accessible via the I2C address 0x50 and can usually be read even if the display is turned off, which is quite nice.

Before Video Electronics Standard Association (VESA) came up with this standard, there were multiple non-standard ways out there to provide some kind of basic identification for video device.

Handle all these non-standard ways is of course an unmanageable situation. In that good old days we had to explicitly set all graphics parameters in the xorg.conf file.

Hooray for standards!

Read out the EDID structure

The EDID structure is available for DRM (Direct Rendring Manager) devices via sysfs in raw binary format:

$od  -Anone -t x1 /sys/devices/pci0000:00/0000:00:02.0/drm/card1/card1-DP-4/edid
     00 ff ff ff ff ff ff 00 41 0c c9 c0 ae 00 00 00
     1a 17 01 03 80 3c 22 78 2a 25 95 a9 54 4f a1 26
     0a 50 54 bd 4b 00 d1 00 d1 c0 81 80 95 0f 95 00
     b3 00 81 c0 a9 40 56 5e 00 a0 a0 a0 29 50 30 20
     35 00 55 50 21 00 00 1e 00 00 00 ff 00 41 55 34
     31 33 32 36 30 30 30 31 37 34 00 00 00 fc 00 50
     68 69 6c 69 70 73 20 32 37 32 43 34 00 00 00 fd
     00 32 4c 1e 63 21 00 0a 20 20 20 20 20 20 00 ac

read_edid [1] provide some tools to retrieve and interpret monitor specifications using the VESA DDC protocol. parse-edid is part of this package and we can use it to parse the EDID structure above:

$ parse-edid < /sys/devices/pci0000:00/0000:00:02.0/drm/card1/card1-DP-4/edid
Checksum Correct

Section "Monitor"
    Identifier "Philips 272C4"
    ModelName "Philips 272C4"
    VendorName "PHL"
    # Monitor Manufactured week 26 of 2013
    # EDID version 1.3
    # Digital Display
    DisplaySize 600 340
    Gamma 2.20
    Option "DPMS" "true"
    Horizsync 30-99
    VertRefresh 50-76
    # Maximum pixel clock is 330MHz
    #Not giving standard mode: 1920x1200, 60Hz
    #Not giving standard mode: 1920x1080, 60Hz
    #Not giving standard mode: 1280x1024, 60Hz
    #Not giving standard mode: 1440x900, 75Hz
    #Not giving standard mode: 1440x900, 60Hz
    #Not giving standard mode: 1680x1050, 60Hz
    #Not giving standard mode: 1280x720, 60Hz
    #Not giving standard mode: 1600x1200, 60Hz
    Modeline        "Mode 0" +hsync +vsync
EndSection

This is the EDID for my Philips Monitor.

Provide custom EDID structure to DRM

I'm working with a custom projector (yes, projectors are display devices too) board for an embedded Linux system. Unfortunately, the processor has an errata for the DDC channel which causes the retrieved EDID structure to be corrupt, so I have to manually provide EDID information for the DRM layer.

For such situations, the kernel has introduced the CONFIG_DRM_LOAD_EDID_FIRMWARE configuration item. It let you provide a individually prepared EDID data in the lib/firmware directory to be loaded instead of retrieving it on the DDC channel. The functionality is disabled by default as it is mostly a workaround for broken hardware, and you will, luckily enough, have to search hard to find such a hardware these days.

The sources also contains a few built-in [2] structures for commonly used screen resolutions for us to use:

#define GENERIC_EDIDS 6
static const char * const generic_edid_name[GENERIC_EDIDS] = {
    "edid/800x600.bin",
    "edid/1024x768.bin",
    "edid/1280x1024.bin",
    "edid/1600x1200.bin",
    "edid/1680x1050.bin",
    "edid/1920x1080.bin",
};

See the kernel documentation [3] for more details.

Use the custom EDID structure

We could either place our custom EDID data in /lib/firmware/edid/ or use one of those build-in structures. Either way, pass drm_kms_helper.edid_firmware pointing to the right structure as argument to the kernel.

Example on bootargs that use the built-in 800x600 EDID structure:

drm_kms_helper.edid_firmware=edid/800x600.bin

Here is my projector in action showing the Qt Analog Clock [4] example.

/media/edid-clock.jpg

(Yes, crappy image, it looks much better IRL)

Audio and Embedded Linux

Audio and Embedded Linux

Brief

Last time I used wrote kernel drivers for the ASoC (ALSA System on Chip) subsystem, the functionality was split up into these parts:

  • Platform class driver that defines the SoC audio interface for the actual CPU itself. This includes both the DAI (Digital Audio Interface) and any potential audio muxes (e.g. i.MX6 has its AUDMUX).
  • CODEC class driver that controls the actual CODEC.
  • Machine drivers that is the magic glue between the SoC and the CODEC which connect the both interfaces. Such driver had to be written for each SoC-CODEC combination, and that does not scale very well.

Nowadays, most of the CODEC class drivers is now adapted to be described with the simple-audio-card [1] in a device tree, which will completely replace the machine drivers.

The goal with this post is to describe my work to setup a a 20W mono class-D audio amplifier to work with a i.MX8MM board.

General

The configuration of the CODEC is usually done on a I2C bus, even if other simple busses like SPI could be used as well. When configuration is sent over this simple bus, the audio data is sent over a complete different bus.

Audio data could be transferred in many different formats such as AC97, PCM or I2S.

To abstract this bus and handle it in a common way, we will just call it DAI, for Digital Audio Interface.

Different SoCs have of course different names on this as well. For example, Texas Instruments has its McASP, NXP uses SSI, Atmel SSC and so on.. We call it DAI all over.

Serial audio formats

AC97

AC97 is a commonly found interface on many PC cards, it is not that popular in embedded devices though. It is a five wire interface with:

  • A reset line
  • DATA_OUT for playback
  • SDATA_IN for capture
  • BCLK as bit clock, which is always driven by the CODEC

See the specification [4] for further reading.

I2S

I2S is a common 5 wire DAI often used in embedded systems. The TX (SDOUT) and Rx (SDIN) lines are used for audio transmission while the bit and frame clock are used for synchronization.

The signals are:

  • Master clock or system clock, often referred to as MCLK, is the clock which the other clocks is derived from. This also clock the CODEC.
  • Bit clock, often referred to as BCK or BCLK, varies depending on the sample rate.
  • Frame clock, often referred to ass LRCLK (Left-Right Clock), FCLK (Frame clock) or WCLK (Word clock).
  • Audio out, SDOUT
  • Audio In, SDIN

The relationship between BCLK and LRCLK is

bclk = (sample rate) * Nchannels * (bit depth)

Some CODECs are able to use BCLK as their only clock, which leaving MCLK as optional. The CODEC we will use does supports this and is something we have to use due to HW constraints in number of available signals that should fit in a connector.

This is an illustration of the timing on the I2S bus with 64 BCLKs per LRCLK. Borrowed from the datasheet [5]:

/media/i2s.jpg

I2S could be used with TDM format timing to support more audio channels on the same I2S bus. The timing will then look like this [5] :

/media/i2s-tdm.jpg

PCM

PCM is a 4 wire interface that is quite similar to I2S. Same same but different.

Clocks

We have several clocks such as bit clock, frame clock and master clock. It is not written in stone which endpoint of the DAI that should generate these clocks, it is up to us to decide.

Either the SoC or CODEC generates some or all of the clocks, called clock master (e.g. bit clock master or frame clock master).

It is often easiest to let the CODEC generate all clocks, but some SoCs has specialized audio PLLs for this. In our case, the SoC will be clock master.

The Hardware

The SoC

The board we are going to use is an evaluation board for a i.MX8MM module [2]. The CPU module supports two I2S busses and we are going to use one of them.

/media/sm2simx8m.jpg

The CODEC

The CODEC we will use is is the TAS5720L [3] from Texas Instruments which has been supported in mainline since v4.6.

/media/tas5720l.jpg

The TAS5720L device Serial Audio Interface (SAIF) supports a variety of audio formats including I2S, left-justified and Right justified. It also supports the time division multiplexed (TDM) format that is capable of transporting up to 8 channels of audio data on a single bus.

It uses I2C as configuration interface.

We will use I2S with TDM as DAI and I2C as configuration interface.

The Software

As we mostly got rid of the machine drivers and can describe the CODEC bindings using device tree, the setup is mostly a exercise in device tree writing rather than C.

The device tree node to setup the sound card is simple-audio-card [6].

SAI node

The Synchronous Audio Interface (SAI) module is the HW part of the i.MX8 SoC that are used to generate the digital audio.

We are going to use the SAI5 interface as it is routed from the sm2s-imx8mm module. The node is properly configured in an interface (.dtsi)-file, so we only have to enable it:

&sai5 {
    status = "okay";
};

CODEC node

The TAS5720L is connected to the I2C3 bus and respond to the slave address 0x6c. Besides the compatible and reg properties, the node also requires phandle for a 3V3 supply that supplies the digital circuitry and a phandle for the Class-D amp and analog part.

The hardware does not have such controllable supplies so we have to create fixed regulators for that:

/ {
    reg_audio_p: regulator-audio {
        compatible = "regulator-fixed";
        regulator-name = "audio power";
        pinctrl-names = "default";
        regulator-min-microvolt = <12000000>;
        regulator-max-microvolt = <12000000>;
    };

    reg_audio_d: regulator-audio {
        compatible = "regulator-fixed";
        regulator-name = "audio digital";
        pinctrl-names = "default";
        regulator-min-microvolt = <3300000>;
        regulator-max-microvolt = <3300000>;
    };
};

And the device node for the CODEC itself:

&i2c3 {

    tas5720: tas5720@6c {
            #sound-dai-cells = <0>;
            reg = <0x6c>;
            compatible = "ti,tas5720";

            dvdd-supply = <&reg_audio_d>;
            pvdd-supply = <&reg_audio_p>;
    };
};

Sound node

Now it is time to setup the sound node!

First we have to specify which audio format we intend to use by setting simple-audio-card,format to i2s.

We also have to setup the two DAIs (CPU & CODEC) that we are going to use.

This is done by creating sub nodes and refer to the SAI module node and CODEC node as sound-dai respectively.

These sub nodes are referred to when assign frame-master and bitclock-master in the sound node. As we want the SoC to generate both frame- and bit-clock, set cpudai as clock master for both.

/ {
    sound-tas5720 {
        compatible = "simple-audio-card";
        simple-audio-card,name = "tas5720-audio";
        simple-audio-card,format = "i2s";
        simple-audio-card,frame-master = <&cpudai>;
        simple-audio-card,bitclock-master = <&cpudai>;

        cpudai: simple-audio-card,cpu {
            sound-dai = <&sai5>;
            clocks = <&clk IMX8MM_CLK_SAI5_ROOT>;

        };

        simple-audio-card,codec {
            sound-dai = <&tas5720>;
            clocks = <&clk IMX8MM_CLK_SAI5_ROOT>;
        };
    };
};

Sound test

Now we should have everything in place!

Lets use speaker-test, which is part of alsa-utils [8] to test our setup.

root@imx8board:~# speaker-test

speaker-test 1.2.5.1

Playback device is default
Stream parameters are 44000Hz, S16_LE, 1 channels
Using 16 octav es of pink noise
[   12.257438] fsl-sai 30050000.sai: failed to derive required Tx rate: 1411200

That did not turn out well.

Debug clock signals

Lets look what our clock tree looks like:

root@imx8board:~# cat /sys/kernel/debug/clk/clk_summary
    ...
    audio_pll2_ref_sel                0        0        0    24000000          0     0  50000
       audio_pll2                     0        0        0   361267200          0     0  50000
          audio_pll2_bypass           0        0        0   361267200          0     0  50000
             audio_pll2_out           0        0        0   361267200          0     0  50000
    audio_pll1_ref_sel                0        0        0    24000000          0     0  50000
       audio_pll1                     0        0        0   393216000          0     0  50000
          audio_pll1_bypass           0        0        0   393216000          0     0  50000
             audio_pll1_out           0        0        0   393216000          0     0  50000
                sai5                  0        0        0    24576000          0     0  50000
                   sai5_root_clk       0        0        0    24576000          0     0  50000
    ...

The sai5 clock is running at 24576000Hz, and indeed, it is hard to find a working clock divider to get 1411200Hz.

audio_pll2 @ 361267200 looks better. 361267200/1411200=256, allmost perfect!

Then we need to reparent the sai5 module, this is done in the device tree as well:

&sai5 {
    status = "okay";
    assigned-clock-parents = <&clk IMX8MM_AUDIO_PLL2_OUT>;
    assigned-clock-rates = <11289600>;
};

Here is our new clock tree:

root@imx8board:~# cat /sys/kernel/debug/clk/clk_summary
    ...
    audio_pll2_ref_sel                0        0        0    24000000          0     0  50000
       audio_pll2                     0        0        0   361267200          0     0  50000
          audio_pll2_bypass           0        0        0   361267200          0     0  50000
             audio_pll2_out           0        0        0   361267200          0     0  50000
                sai5                  0        0        0    11289600          0     0  50000
                   sai5_root_clk       0        0        0    11289600          0     0  50000
    ...

We can see that the frequency is right and also that we now derive our clock from audio_pll2_out instead of audio_pll1.

The speaker-test software is also happier:

root@imx8board:~# speaker-test

speaker-test 1.2.5.1

Playback device is default
Stream parameters are 44000Hz, S16_LE, 1 channels
Using 16 octaves of pink noise
Rate set to 44000Hz (requested 44000Hz)
Buffer size range from 3840 to 5760
Period size range from 1920 to 1920
Using max buffer size 5760
Periods = 4
was set period_size = 1920
was set buffer_size = 5760
 0 - Front Left

Great!

Use BCLK as MCLK

Due to my hardware constraints, I need to use the bit clock as master clock. If we look in the datasheet [5] :

/media/tas5720-1.png

If the BCLK to LRCLK ratio is 64, we could tie MCLK directly to our BCLK!

We already know our BCLK, it is 1411200Hz, and the frame clock (LRCLK) is the same as the sample rate (44kHz). We could verify that with the oscilloscope.

Bitclock:

/media/bitclock1.png

Frameclock:

/media/frameclock.png

That is not a ratio of 64.

There is not much to do about the frame clock, it will stick to the sample rate. If we make use of TDM though, we can make the bit clock running faster with the same frame clock!

Lets add 2 TDM slots @ 32bit width:

/ {
    sound-tas5720 {
        compatible = "simple-audio-card";
        simple-audio-card,name = "tas5720-audio";
        simple-audio-card,format = "i2s";
        simple-audio-card,frame-master = <&cpudai>;
        simple-audio-card,bitclock-master = <&cpudai>;

        cpudai: simple-audio-card,cpu {
            sound-dai = <&sai5>;
            clocks = <&clk IMX8MM_CLK_SAI5_ROOT>;
            dai-tdm-slot-num = <2>;
            dai-tdm-slot-width = <32>;
        };

        simple-audio-card,codec {
            sound-dai = <&tas5720>;
            clocks = <&clk IMX8MM_CLK_SAI5_ROOT>;
        };
    };
};

Verify the bitclock:

/media/bitclock1.png

Lets calculate: 2820000/44000 ~= 64! We have reached our goal!

Final device tree setup

This is what the final device tree looks like:

/ {
    sound-tas5720 {
        compatible = "simple-audio-card";
        simple-audio-card,name = "tas5720-audio";
        simple-audio-card,format = "i2s";
        simple-audio-card,frame-master = <&cpudai>;
        simple-audio-card,bitclock-master = <&cpudai>;

        cpudai: simple-audio-card,cpu {
            sound-dai = <&sai5>;
            clocks = <&clk IMX8MM_CLK_SAI5_ROOT>;
            dai-tdm-slot-num = <2>;
            dai-tdm-slot-width = <32>;
        };

        simple-audio-card,codec {
            sound-dai = <&tas5720>;
            clocks = <&clk IMX8MM_CLK_SAI5_ROOT>;
        };
    };

    reg_audio_p: regulator-audio {
        compatible = "regulator-fixed";
        regulator-name = "audio power";
        pinctrl-names = "default";
        regulator-min-microvolt = <12000000>;
        regulator-max-microvolt = <12000000>;
    };

    reg_audio_d: regulator-audio {
        compatible = "regulator-fixed";
        regulator-name = "audio digital";
        pinctrl-names = "default";
        regulator-min-microvolt = <3300000>;
        regulator-max-microvolt = <3300000>;
    };

};

&i2c3 {

    tas5720: tas5720@6c {
            #sound-dai-cells = <0>;
            reg = <0x6c>;
            compatible = "ti,tas5720";

            dvdd-supply = <&reg_audio_d>;
            pvdd-supply = <&reg_audio_p>;
    };
};

&sai5 {
    status = "okay";
    assigned-clock-parents = <&clk IMX8MM_AUDIO_PLL2_OUT>;
    assigned-clock-rates = <11289600>;
};

Conclusion

simple-audio-card is a flexible way to describe the audio routing and I strongly prefer this way over write a machine driver for each SoC-CODEC setup.

My example here is kept to a minimum, you probably want to add widgets and routing as well.

simple-audio-card does support rather complex setup with multiple DAI links, amplifier and such. See the device tree bindings [6] for further reading.

Debug kernel with KGDB

Debug kernel with KGDB

What is KGDB?

KGDB intend to be used as a source code level debugger on a running Linux kernel. It works with GDB and allows the user to inspect memory, variables, setup breakpoints, step lines and instructions. Pretty much the same that all application developers are used to, but for the kernel itself.

Almost every embedded Linux system does have a serial port available, and that is all that you need to connect GDB to your kernel.

One thing to keep in mind, as with all debugging, is that everything that is called timing will be messed up. It will be pretty obvious when you actually pause a running kernel that keeps up the communicates with all hardware. Especially if you have any hardware watchdogs enabled...

Compile the kernel with support for KGDB

There are a few kernel options that you have to enable in order to use KGDB:

  • CONFIG_KGDB to enable remote debugging.
  • CONFIG_KGDB_SERIAL_CONSOLE let you share a serial console with GDB.
  • CONFIG_FRAME_POINTER is used to produce more reliable stack backtraces by inserting code to preserve the frame information in registers or on the stack.
  • CONFIG_KALLSYMS_ALL to make sure that all symbols are loaded into the kernel image (i.e. symbols from all sections).
  • CONFIG_MAGIC_SYSRQ to be able to make sysrq. More about this below.

KGDBOC

KGDB over console, or kgdboc, let you use a console port as the debugging port. If we only have one serial port available, we could split the console and gdb communication using agent-proxy [2] .

Agent-proxy

To split a serial port into console and GDB communication we could use agent-proxy. Download and compile agent-proxy

git clone http://git.kernel.org/pub/scm/utils/kernel/kgdb/agent-proxy.git
cd agent-proxy
make

Launch agent-proxy

agent-proxy 4440^4441 0 /dev/ttyS0,115200

If your hardware does not support the line break sequence you have to add the -s003 option. You will find out pretty soon if it is needed - if your target continues to run after sending a break, then you should try to add it, in other words:

agent-proxy 4440^4441 0 /dev/ttyS0,115200 -s003

Where ttyS0 is the serial port on your host.

This will create two TCP sockets, one for serial console and with for GDB, which is listening to port 4440 and 4441 respectively.

Connect to the serial console with your favorite client (socat, netcat, telnet...)

telnet localhost 4440

Setup kgdboc with kernel arguments

kgdboc could be used early in the boot process if it is compiled into the kernel as a built-in (not as module), by providing the kgdboc arguments.

Add kgdboc=<tty-device>,[baud] to your command line argument, e.g.

kgdboc=ttyS0,11500

Where ttyS0 is the serial port on the target.

The kgdbwait argument stops the kernel execution and enter the kernel debugger as earliest as possible. This let you connect to the running kernel with GDB.

See kernel parameters [1] for more information.

Setup kgdboc with kernel module

If the kgdb is not compiled to be built-in but as a module, you provide the same arguments while loading the kernel

modprobe kgdboc=ttyS0,115200

Setup kgdboc in runtime using sysfs

It is also possible to enable kgdboc by echoing parameters into sysfs entries

echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc

Connect GDB to a running kernel

Stop execution and wait for debugger

We have to stop the execution of the kernel in order to connect with gdb.

If gdbwait is provided as boot argument, the kernel will stop its execution and wait.

Otherwise we have to trigger this manually by using SysRq-G. This requires that the CONFIG_MAGIC_SYSRQ is enabled in your kernel config.

Your favorite serial application does probably have some keyboard combination to send SYSRQ requests (GNU Screen has "CTRL+A b" for example), otherwise you can use procfs to send the trigger

echo g > /proc/sysrq-trigger

Connect with GDB

Start GDB and provide the vmlinux from your kernel root directory, remember that you have to use the GDB that came with your toolchain. I do always use the -tui flag to start with a nice terminal user interface

aarch64-linux-gnu-gdb -tui ./vmlinux

Now, if you have a separate serial port for gdb, you could connect to it directly

(gdb) set remotebaud 115200
(gdb) target remote /dev/ttyS0

If you are using agent-proxy, then we should connct to port 4441

(gdb) target remote localhost:4441

Now you are able to set breakpoints, watch variables and use gdb as you used to.

/media/kgdb.jpg

One tip is to set a breakpoint at ksys_sync

(gdb) b ksys_sync

This let you run the sync command as trig to go-to-debug-mode.

Raspberry Pi and QEMU

Raspberry Pi and QEMU

What is QEMU?

QEMU is a generic and open source machine emulator and visualizer. It emulates full machines (boards) of different architectures and is useful for both application and kernel development. The CPU itself could be fully emulated (together with devices, memories and so on) or work with a hypervisor such as KVM or Xen.

/media/qemu-logo.png

If support for your hardware is missing, then it is a fairly easy task to write a stub driver that your application can interface. The most fun part for kernel development is so connect GDB (CONFIG_KGDB) to a running kernel, set breakpoints and step through the kernel code.

What about Raspberry Pi and QEMU?

My normal procedure is to build a custom root file system with Buildroot [1], clone the kernel source code and build the kernel. This is also my preferred workflow as I have control on what I actually running, but sometimes it could be handy to just take an already built setup and use it.

I'm currently doing some work with the Raspberry Pi 3b+ which is using an Raspbian image, so why not emulate it?

Board support

QEMU provides models of the following Raspberry Pi Boards:

Machine Core Number of cores RAM
raspi0 ARM1176JZF-S 1 512 MiB
raspi1lap ARM1176JZF-S 1 512 MiB
raspi2b Coretx-A7 4 1 GB
raspi3ap Cortex-A53 4 512 MiB
Raspi3b Cortex-A53 4 1 GB

Device support

QEMU provides support for the following devices:

  • ARM1176JZF-S, Cortex-A7 or Cortex-A53 CPU
  • Interrupt controller
  • DMA controller
  • Clock and reset controller (CPRMAN)
  • System Timer
  • GPIO controller
  • Serial ports (BCM2835 AUX - 16550 based - and PL011)
  • Random Number Generator (RNG)
  • Frame Buffer
  • USB host (USBH)
  • GPIO controller
  • SD/MMC host controller
  • SoC thermal sensor
  • USB2 host controller (DWC2 and MPHI)
  • MailBox controller (MBOX)
  • VideoCore firmware (property)

However, it still lacks support for these:

  • Peripheral SPI controller (SPI)
  • Analog to Digital Converter (ADC)
  • Pulse Width Modulation (PWM)

Set it up

Prerequisites

You will need to have qemu-system-aarch64, you could either build it from source [2] or let your Linux distribution install it for you.

If you are using Arch Linux, then you could use pacman

sudo pacman -Sy qemu-system-aarch64

You will also need to do download and extract the Raspian image you want to use

wget https://downloads.raspberrypi.org/raspios_lite_arm64/images/raspios_lite_arm64-2022-09-26/2022-09-22-raspios-bullseye-arm64-lite.img.xz
unxz 2022-09-22-raspios-bullseye-arm64-lite.img.xz

Loopback mount image

The image could be loopback mounted in order to extract the kernel and devicetree. First we need to figure out the first free loopback device

sudo losetup -f
/dev/loop8

Then we could use that device to mount:

sudo losetup /dev/loop8  ./2022-09-22-raspios-bullseye-armhf-lite.img  -P

The -P option force the kernel to scan the partition table. As the sector size of the image is 512 bytes we could omit the --sector-size.

Mount the boot partition and root filesystem

mkdir ./boot ./rootfs
sudo mount /dev/loop8p1 ./boot/
sudo mount /dev/loop8p2 ./rootfs/

Copy kernel and dtb

cp boot/bcm2710-rpi-3-b.dtb .
cp boot/kernel8.img .

If you have any modification you want to do on the root filesystem, do it now before we unmount everything.

sudo umount ./boot/
sudo umount ./rootfs/

Resize image

QEMU requires the image size to be a power of 2, so resize the image to 2GB

qemu-img resize  ./2022-09-22-raspios-bullseye-armhf-lite.img 2G

Very note that this will lose data if you make the image smaller than it currently is

Wrap it up

Everything is now ready for start QEMU. The parameters are quite self-explained

qemu-system-aarch64 \
    -M raspi3b \
    -cpu cortex-a72 \
    -append "rw earlyprintk loglevel=8 console=ttyAMA0,115200 root=/dev/mmcblk0p2 rootdelay=1" \
    -serial stdio \
    -dtb ./bcm2710-rpi-3-b.dtb \
    -sd ./2022-09-22-raspios-bullseye-armhf-lite.img \
    -kernel kernel8.img \
    -m 1G -smp 4

Here we go

raspberrypi login: pi
Password:
Linux raspberrypi 5.10.103-v8+ #1529 SMP PREEMPT Tue Mar 8 12:26:46 GMT 2022 aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
pi@raspberrypi:~$

Conclusion

QEMU is fun. It is a good way to explore stuff like ftrace, step through kernel code or simple run your application on a virtual board. You do not allways have to build everything yourself, sometimes a raspian image could be what you need.

Crosscompile libcamera for RPi

Crosscompile libcamera for RPi

Goal

The goal is to cross-compile libcamera [1] and libcamera-apps [2] for Raspberry Pi using the latest Raspbian [3] (Bullseye) release. Usually you setup the root filesystem with Buildroot [4] or Yocto [5] and generate a SDK that you can use to compile your application. The Raspbian distribution does not come with a SDK so we have to setup our own.

We will use a Raspberry Pi 3b for this.

What is libcamera?

You can read about libcamera in a previous post [6].

Prepare the SD-card

We will boot the Raspberry Pi from a SD-card, so we first have to prepare it.

Download Raspbian Bullseye

You could either download the image [3] yourself or use rpi-imager. Only make sure that you chose the 64-bit version as that is what the toolchain we are going to use is built for.

To download and flash it yourself

wget https://downloads.raspberrypi.org/raspios_lite_arm64/images/raspios_lite_arm64-2022-09-26/2022-09-22-raspios-bullseye-arm64-lite.img.xz
tar xf 2022-09-22-raspios-bullseye-arm64-lite.img.xz
sudo dd if=2022-09-22-raspios-bullseye-arm64-lite.img.xz of=/dev/mmcblk0 conv=sync

/dev/mmcblk0 will be overwritten, please double check that this is your SD-card dedicated for your Raspberry and not your wedding pictures.

I will stick to rpi-imager as it let you configure WLAN, enable SSH and set passwords in the configuration menu. I find it smooth.

/media/rpi-imager.jpg

As I'm using Arch Linux, rpi-imager is available as an AUR package

git clone https://aur.archlinux.org/rpi-imager.git
cd rpi-imager
makepkg -i

Other Debian-based distribution could just use apt-get

apt-get install rpi-imager

Enable UART

Once the image is written to the SD-card we can enable the UART. This is not necessary but I strongly prefer to have a serial port connected to the device I'm working with to see the boot log and get a getty.

Mount the boot partition and enable UART by write "enable_uart=1" to config.txt

sudo mount /dev/mmcblk0p1 mnt/disk/
echo enable_uart=1 | sudo tee -a mnt/disk/config.txt
sudo umount mnt/disk

Permit root login on SSH

Now it is time to power up the Raspberry Pi. Put the SD-card in the slot, power it up and login as the pi user, either via UART or SSH.

Permit root login in order to mount root filesystem via SSH

echo PermitRootLogin yes  | sudo tee -a /etc/ssh/sshd_config

Restart the SSH service

sudo systemctl restart sshd.service

Note that it is bad practice to let the root user login via SSH (especially with password). Either use SSH-keys or disable it later.

If you not allready have figured out which IP address the RPi has, grab it with ip addr

pi@raspberrypi:~$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b8:27:eb:91:e6:2a brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.111/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0
       valid_lft 409sec preferred_lft 334sec
    inet6 fe80::cfe6:1f35:c5b6:aa1a/64 scope link
       valid_lft forever preferred_lft forever
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b8:27:eb:c4:b3:7f brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.56/24 brd 192.168.1.255 scope global dynamic noprefixroute wlan0
       valid_lft 408sec preferred_lft 333sec
    inet6 fe80::61e6:aeb:2c0e:f31b/64 scope link
       valid_lft forever preferred_lft forever

My board has the IP 192.168.1.111 on the eth0 interface.

Set root password

The root user does not have a password by default, so set it

sudo passwd root

Install dependencies

Especially libcamera-apps have a lot of dependencies, install those using apt-get

sudo apt-get install libboost-program-options-dev libboost-dev libexif-dev libjpeg-dev libtiff-dev libpng-dev libdrm-dev libavcodec-dev libavdevice-dev

Prepare the host

Now we have everything on target in place so can switch back to the host system.

From now on, we will use environment variables to setup paths to all directories we will refer to. First, create those directories

mkdir rootfs staging tools

rootfs will be used for our network mounted root filesystem. staging will be our sysroot we compile against tools will contain our cross toolchain

Export the environment variables

export RPI_BASE=`pwd`
export RPI_ROOTFS=$RPI_BASE/rootfs
export RPI_STAGING=$RPI_BASE/staging
export RPI_TOOLS=$RPI_BASE/tools/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/

Install cross toolchain

Download cross toolchain with the same GCC version (10.2) as Raspian.

wget http://sources.buildroot.net/toolchain-external-arm-aarch64/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz
tar -xf gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz -C tools/

Add the toolchain to your $PATH

export PATH=$RPI_TOOLS:$PATH

Mount root filesystem

SSHFS (Secure SHell File System) [7] is a handy tool based in libfuse that let you mount a filesystem over SSH.

Mount the root filesystem

sshfs root@192.168.1.111:/ $RPI_ROOTFS

Prepare the staging directory

Here we will copy files from our root filesystem

mkdir -p $RPI_STAGING/usr/
cp -r $RPI_ROOTFS/usr/lib $RPI_STAGING/usr/
cp -r $RPI_ROOTFS/usr/include/ $RPI_STAGING/usr/

Now you may use ./rootfs as your sysroot. However, as the Rasbian image has C-library for both musl and glibc, the search pathes becomes hard to handle. We will look on how we could handle this later, but that is simply not worth it in my opinion. Instead, copy the glibc libraries to /usr/lib

cp $RPI_STAGING/usr/lib/aarch64-linux-gnu/* $RPI_STAGING/usr/lib/

We also need to create a symlink to /lib as the toolchain is looking for the linux loader in that directory

ln -s usr/lib/ $RPI_STAGING/lib

The the libpthread.so.6 is pointing to an aboslut path (will point to our host system). The result will be that the linker does not find the library and will fallback on the static linked library instead. That will not fall out good as the glibc is still dynamically linked... so create a new symlink

ln -sf libpthread.so.0 $RPI_STAGING/lib/libpthread.so

Cross compile libcamera

Cross compile libcamera is quite straight forward as it does use Meson as build system and do not have any external dependencies.

Clone the repository

git clone https://git.libcamera.org/libcamera/libcamera.git
cd libcamera

We need to create a toolchain file to instruct the Meson build system which toolchain it should use. Create aarch64.txt which contains

[binaries]
c = 'aarch64-none-linux-gnu-gcc'
cpp = 'aarch64-none-linux-gnu-g++'
ar = 'aarch64-none-linux-gnu-ar'
strip = 'aarch64-none-linux-gnu-strip'

[host_machine]
system = 'linux'
cpu_family = 'x86_64'
cpu = 'x86_64'
endian = 'little'

Now we could build

meson build -Dprefix=/usr/ --cross-file ./aarch64.txt
cd build
ninja

Install on both root filesystem and staging directory

DESTDIR=$RPI_ROOTFS ninja install
DESTDIR=$RPI_STAGING ninja install

Cross compile libcamera-apps

libcamera-apps on the other hand, have a lot of dependencies but as we have collected all of those into the staging directory, even this is pretty straight forward.

Just for your information, if we did not copy the glibc-libraries from /usr/lib/aarch-linux-gnu to /usr/lib, this would not be a straight trip at all.

Clone the repository

git clone https://github.com/raspberrypi/libcamera-apps.git
cd libcamera-apps

I'm not interresting in the preview application as it needs Qt5, so I will remove it

sed -i "/add_subdirectory(preview)/d" CMakeLists.txt

I also did remove libcamera-raw and libcamera-vid from apps/CMakeLists.txt as those have dependencies that I do not want on my target.

libcamera-apps does use CMake as build system, and we need to create a toolchain file for this as well. Create the file aarch64.cmake which contains

set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSTEM_PROCESSOR arm)

set(CMAKE_SYSROOT $ENV{RPI_STAGING})

#We need to point to the glibc-headers
set(CMAKE_CXX_STANDARD_INCLUDE_DIRECTORIES $ENV{RPI_STAGING}/usr/include/aarch64-linux-gnu/)

set(CMAKE_C_COMPILER $ENV{RPI_TOOLS}/aarch64-none-linux-gnu-gcc)
set(CMAKE_CXX_COMPILER $ENV{RPI_TOOLS}/aarch64-none-linux-gnu-g++)

#Let pkg-config look in our root
SET(ENV{PKG_CONFIG_LIBDIR} ${CMAKE_FIND_ROOT_PATH}/lib/pkgconfig/)

set(CMAKE_FIND_ROOT_PATH "${CMAKE_SYSROOT}")
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)

Now we are ready to build

mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=../aarch64.cmake ../
make

And install

make DESTDIR=$RPI_ROOTFS install

We do only need to install to our root filessytem as it should run on target.

Test on target

We can now run the applications on target and verify that it is our newly built app:

pi@raspberrypi:/usr/local/bin$ LD_LIBRARY_PATH=/usr/local/lib ./libcamera-jpeg  --version
libcamera-apps build: 80f17befef34-intree-dirty 26-11-2022 (15:55:47)
libcamera build: v0.0.2+34-b35f04b3

For the good sake, we could also take an image

pi@raspberrypi:/usr/local/bin$ LD_LIBRARY_PATH=/usr/local/lib ./libcamera-jpeg  -o /tmp/test1.jpg
/media/libcamera-rpi-small.jpg

Now you can disable root login for SSH

sed -i "/PermitRootLogin/d" /etc/ssh/sshd_config
sudo systemctl restart sshd.service

What if...

...we should not copy /usr/lib/aarch64-linux-gnu to /usr/lib but keep it as it is? It would be nice to get rid of the staging directory and only use the root filesystem for cross compiling.

That was my first intention, but it turned out to be really troublesome, even if I got it to work at last.

In short what is needed to be done is:

  • Export PKG_CONFIG_SYSROOT_DIR to $RPI_ROOTFS
  • Export PKG_CONFDIG_PATH to $RPI_ROOTFS/usr/lib/aarch64-linux-gnu/pkgconfig to make it find pkgconfig files
  • Export BOOST_ROOT to $RPI_ROOTFS to give FindBoost.cmake a hint of where to look
  • Export BOOST_LIBRARYDIR to $RPI_ROOTFS/usr/lib/aarch64-linux-gnu
  • Set CMAKE_LIBRARY_PATH to look into $RPI_ROOTFS/usr/lib/aarch64_linux-gnu
  • Set CMAKE_CXX_FLAGS_INIT, CMAKE_C_FLAGS_INIT and CMAKE_EXE_LINKER_FLAGS to search $RPI_ROOTFS/usr/lib/aarch64-linux-gnu for libraries

There are simply too much special and I think it is not worth it.

Conclusion

This sounds like a simple thing, but it actually took quite a while to get it working. Mostly because of the that the Raspian distribution supports both and glibc/musl so that libraries, pkgconfig files and headers end up in non-standard search paths.

I find it quite strange that there are no SDK available for the Raspian images, it would help development against the platform a lot.

What is libcamera and why should you use it?

What is libcamera and why should you use it

Read out a picture from camera

Once in a time, video devices was not that complex. To use a camera back then, your application software could iterated through /dev/video* devices and pick the camera that you want and then immediately start using it. You could query which pixel formats, frame rates, resolutions and all other properties that are supported by the camera. You could even easily change it if you want.

This still works for some cameras, basically every USB camera and most laptop cameras still works that way.

The problem, especially in embedded systems, is that there is no such thing as "the camera" anymore. The camera system is rather a complex pipeline of different image processing nodes that the image data traverse through to be shaped as you want. Even if the result of this pipeline will end up in a video device, you cannot configure things like cropping, resolution etc. directly on that device as you used to. Instead, you have to use the media controller API to configure and link each of these nodes to build up your pipeline.

To show how it may look like; this is a graph that I had in a previous post [3]:

/media/media-ctl-graph.png

What is libcamera?

/media/libcamera-banner.png

This is how libcamera is described on their website [1]

libcamera is an open source camera stack for many platforms with a core userspace library, and support from the Linux kernel APIs and drivers already in place.
It aims to control the complexity of embedded camera hardware by providing an intuitive API and method of separating untrusted vendor code from the open source core.

libcamera aims to encourage the development of new embedded camera applications by limiting the complexity that developers have to deal with.
The interface is designed around the way that modern embedded camera hardware works.

First time I heard about libcamera was on the Embedded Linux Conference 2019 where Jacopo Mondi had a talk [2] about the Public API for the first stable libcamera release. I have been working with cameras in several embedded Linux products and know for sure how complex [3] these little beast could be. The configuration also differ depending on which platform or camera you are using as there is no common way to setup the image pipe. You will soon have special cases for all your platform variants in your application. Which is not what we strive for.

libcamera is trying to solve this by provide one library that takes care of all that complexity for you.

For example, if you want to adjust a simple thing, say contrast, of a IMX219 camera module connected to a Raspberry Pi. To do that without libcamera, you first have to setup a proper image pipeline that takes the camera module, connect it to the several ISP (Image Signal Processing) blocks that your processor offers in order to get the right image format, resolution and so on. Somewhere between all these configuring, you realise that the camera module nor the ISPs have support for adjust the contrast. Too bad. To achieve this you have to take the image, pass it to a self-written contrast algorithm, create a gamma curve that the IPA (Image Processing Algorithm) understands and actually set gamma. Yes, the contrast is adjusted with a gamma curve for that particular camera on Raspberry Pi. ( Have a look at the implementation of that IPA block [7] for Raspberry Pi )

This is exactly the stuff libcamera understands and abstract for the user. libcamera will figure out what graph it has to build depending on what you want do to and which processing operations that are available at your various nodes. The application that is using libcamera for the video device will set contrast for all cameras and platforms. After all, that is what you wanted.

Camera Stack

As the libcamera library is fully implemented in userspace and use already existing kernel interfaces for communication with hardware, you will need no extra underlying support in terms of separate drivers or kernel support.

libcamera itself exposes several API's depending on how the application want to interface the camera. It even have a V4L2 compatiblity layer to emulate a high-level V4L2 camera device to make a smooth transition for all those V4L2 applications out there.

/media/libcamera-layer.png

Read more about the camera stack in the libcamera documentation [4].

Conclusion

I really like this project and I think we need an open-source stack that supports many platforms. This vendor-specific drivers/libraries/IPAs-situation we are in right now is not sustainable at all. It takes too much effort to evaluate a few cameras of different vendors just because all vendors has their own way to control the camera with their own closed-source and platform specific layers. Been there done that.

For those vendors that do not want to open-source their secret image processing algorithms, libcamera uses a plugin system for IPA modules which let vendors keep their secrets but still be compatible with libcamera. All open-source modules are identified based on digital signatures, while closed-source modules are instead isolated inside a sandbox environment with restricted access to the system. A Win-Win concept.

The project itself is still quite young and need more work to support more platforms and cameras, but the ground is stable. Raspberry Pi is now a common used platform, both in commercial and hobby, and the fact that Raspberry Pi Foundation has chosen libcamera as their primary camera system [8] must tell us something.