NAT with Linux

NAT with Linux

To share an internet connection may sometimes be very practical when working with embedded devices. The network may have restrictions/authentications that stops you from plug in your device into the network of the big company you are working for.

But what about creating your own network and use your computer as NAT (Network Address Translation)?
I was surprised how easy it is to set up your Linux host as a NAT, it is just a few command lines.

OK, here is the setup on Host:
– eth0 has ip address 192.168.1.50 and is connected to the company network
– eth1 has ip address 10.2.234.1 ans is connected to the target

Setup on Target:
– eth0 has ip address 10.2.234.100 ans is connected to host

First of all, we need to setup a default gateway on our target, do this as you allways do – with route.:

Target$ route add default gw 10.2.234.1 eth0

Next, we need to create a post-routing rule in the to the NAT table that masquerades all traffic to the eth0 interface.
iptables is your friend:

Host$ sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

Thats it! Well, allmost. We just need to enable ip-forwarding.:

Host$ echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward

High resolution timers

High resolution timers

Nearly all systems has some kind of Programmable Interrupt Timer (PIT) or High Precision Event Timer (HPET) that is programmed to periodically interrupt the operating system (if not configured with CONFIG_NO_HZl). The kernel performs several tasks in every of these ticks, such as timekeeping, calculate statistics for the currently running process, schedule a new process and so on.
The interrupt occurs at regular intervals – exactly HZ times per second. HZ is architecture specific and defined in asm-arch/param.h.

Jiffies is a central concept when talking about time management in the Linux kernel. A jiffy is simple the time between the ticks. More exactly 1/HZ seconds.
HZ has a typical value of 250 on IA-32/AMD64 architectures, and 100 on smaller systems such as ARM.

Most of the time management in the Linux-kernel is based on jiffies, even the timer_list (also known as low-resolution-timers).

High resolution timers (hrtimers) in the Linux kernel is timers that do not use a time specification based on jiffies, but employ nanosecond time stamps. In fact, the low resolution timers are implemented on top of the high-resolution mechanism, but that is another story.
Components of the hrtimer framework that are not universally applicable (not used by the low-resolution timers) is selected by CONFIG_HIGH_RES_TIMERS in the kernel configuration.

Setting up a timer

The usage of the hrtimers is really simple.

  1. Initialize a struct hrtimer with

    hrtimer_init(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode);
    

timer is a pointer to the instance of the struct hrtimer.
clock is the clock to bind the timer to, often CLOCK_MONOTONIC or CLOCK_REALTIME.
mode specifies if the timer is working with absolute or relative time values. Two constants are available: HRTIMER_MODE_ABS and HRTIMER_MODE_REL.

Set a callback function with:

mytimer.function = my_callback;

Where my_callback is declared as:

enum hrtimer_restart my_callback(struct hrtimer *timer)

Start the timer with hrtimer_start:

struct ktime_t delay = ktime_set(5,0);
hrtimer_start(&mytimer, delay, CLOCK_MONOTONIC);

ktime_set initialize delay with 5 seconds and 0 nanoseconds.

Wait. The callback function will be called after 5s!

A full example

struct hrtimer mytimer;
ktime_t delay = ktime_set(5, 0);

enum hrtimer_restart my_callback(struct hrtimer *timer)
{
    printk("Hello from timer!n");
    return HRTIMER_NORESTART;
}

void ....()
{
    hrtimer_init(&mytimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
    mytimer.function = my_callback;

    hrtimer_start(&mytimer, delay, CLOCK_MONOTONIC);
}

Further reading

There is more functions related to the hrtimers. See incude/linux/hrtimer.h for a full list.
Other useful functions are:

int hrtimer_cancel(struct hrtimer *timer)
int hrtimer_try_to_cancel(struct hrtimer *timer)
int hrtimer_restart(struct hrtimer *timer)

Terminate a hanging SSH session

Terminate a hanging SSH session

It may be very frustrating when SSH sessions just hangs because the target is power cycling or something. Lucky for you there is a "secret" escape sequence that allows you to terminate the session (and a few other things).

The escape sequence is <enter>~X where X is a command letter. To see all available key sequences, type <enter>~?. Example output:

marcus@Ilos:~$ ~?
Supported escape sequences:
  ~.  - terminate connection (and any multiplexed sessions)
  ~B  - send a BREAK to the remote system
  ~C  - open a command line
  ~R  - Request rekey (SSH protocol 2 only)
  ~^Z - suspend ssh
  ~#  - list forwarded connections
  ~&  - background ssh (when waiting for connections to terminate)
  ~?  - this message
  ~~  - send the escape character by typing it twice
(Note that escapes are only recognized immediately after newline.)

<enter>~. is my favorit. It terminates the connection and keep your mood cheerful.

Modules with parameters

Modules with parameters

Everybody knows that modules can take parameters, either via /sys/modules/<module>/parameters or via cmdline to the kernel, but how are these parameters created?

Parameters without callbacks

The Linux kernel provides the module_param() macro. The syntax is:

module_param(name, type, perm)

Which will simply create the module parameter and expose it as an entry in /sys/modules/<module>/parameters.

Code example

int debug_flag;
module_param(debug_flag, bool, S_IRUSR | S_IWUSR | S_IRGRP)
MODULE_PARM_DESC(debug_flag, "Set to 1 if debug should be enabled, 0 otherwise");

MODULE_PARM_DESC() is a short description of the parameter. Modinfo will read the description and present it for you.

Parameters with callbacks

Sometimes it may be useful to actually notify the driver that the value of a parameter has changed, which not the regular module_param() macro does.

module_param_cb is the way to go. The macro takes two callbacks functions, set and get, that is called when the user (or kernel if in cmdline) interact with the parameters. This is done by passing a struct kernel_param_ops to the macro. The syntax is:

module_param_cb(name, ops, arg, perm)
The module_param_cb is not heavily used in the kernel if we look in the drivers:
[06:40:35]marcus@tuxie:~/kernel$ git grep module_param_cb drivers/
drivers/acpi/sysfs.c:module_param_cb(debug_layer, &param_ops_debug_layer, &acpi_dbg_layer, 0644);
drivers/acpi/sysfs.c:module_param_cb(debug_level, &param_ops_debug_level, &acpi_dbg_level, 0644);
drivers/char/ipmi/ipmi_watchdog.c:module_param_cb(action, &param_ops_str, action_op, 0644);
drivers/char/ipmi/ipmi_watchdog.c:module_param_cb(preaction, &param_ops_str, preaction_op, 0644);
drivers/char/ipmi/ipmi_watchdog.c:module_param_cb(preop, &param_ops_str, preop_op, 0644);

In fact, there are just 5 entries, don’t ask me why, I think the macro is terrific.
The interface is really simple, just fill the kernel_param_ops struct and pass it to the module_param_cb macro.
I think the code is quite self-explained, so I just post an example taken from drivers/acpi/sysfs.c.

Code example

static int param_get_debug_level(char *buffer, const struct kernel_param *kp)
{
 int result = 0;
 int i;
 result = sprintf(buffer, "%-25stHex        SETn", "Description");
 for (i = 0; i < ARRAY_SIZE(acpi_debug_levels); i++) {
  result += sprintf(buffer + result, "%-25st0x%08lX [%c]n",
      acpi_debug_levels[i].name,
      acpi_debug_levels[i].value,
      (acpi_dbg_level & acpi_debug_levels[i].value)
      ? '*' : ' ');
 }
 result +=
     sprintf(buffer + result, "--ndebug_level = 0x%08X (* = enabled)n",
      acpi_dbg_level);
 return result;
}

static struct kernel_param_ops param_ops_debug_level = {
 .set = param_set_uint,
 .get = param_get_debug_level,
};
module_param_cb(debug_level, &param_ops_debug_level, &acpi_dbg_level, 0644);

There is also a set of standard set/get-functions (the code above use param_set_uint for example).
These are called param_(set|get)_XXX where XXX is byte, short, int, long and so on.

Take a look in include/linux/moduleparam.h for further reading!

Interrupts, and how to divide them between cores

Interrupts, and how to divide them between cores

Symetric MultiProcessing (SMP) are becoming increasingly common in embedded systems. Many chip manufacturers such as Texas Instruments and Freescale offers ARM multicores, even FPGA manufacturers like Xilinx and Altera has System-on-Chip with multiple ARM cores. One benefit with SoC is that it is even possible to add soft cores in the programmable logic if it’s necessary.

The trend is clear, multiple cores is here and it is not likely to be fewer of them.

But how do we share resources between the cores?
One way is to use cgroups (Control Groups) that was merged into the kernel version 2.6.24. Cgroups let you divide resources like cores and memory to specific task groups. More about that in another post.

It is also possible to tell which processor core that is allowed to handle a specific interrupt. This is done in procfs.
There are a few entries in procfs related to interrupt-handling. The first interesting entry is /proc/interrupts that is used to record the number of interrupts per CPU. Beside the the counts of hard IRQ numbers, it also includes interrupts internal to the system that is not associated with a device.
Examples on these are NMI (nonmaskable interrupts) and LOC (Local timer interrupt).
It also gives us the name of the registred IRQ handler.

Example output:

SMP affinity is controlled by manipulating files in the /proc/irq directory.
In /proc/irq is the file default_smp_affinity that specifies the default affinity mask that applies to all non-active IRQs. Once the IRQ is allocated/activeded its affinity bitmask will be set to the default mask.
The default mask is set so all available cpu cores is allowed to handle the interrupt.

In /proc/irq is also directories that correspond to the IRQ present in the system. These directories has a few entries where two of them are of interest.

Here is the entries for IRQ 150, which correspond to the ethernet controller (look at the output from /proc/interrupts):
# ls /proc/irq/150
affinity_hint smp_affinity spurious
node smp_affinity_list
smp_affinity_list contains the available processor cores that are able to handle the interrupt, and the smp_affinity is the bitmask of processor cores that is allowed to handle it.

The current values are:

This tells us that processor core nr 0 and 1 is able to handle the interrupt and processor 0 and 1 (bit 0 and 1 gives us 3 dec) is allowed to handle it.
If we look at the output from /proc/interrupts, we see that all interrupts (19293) has been handled by the CPU0. If we just want CPU1 to handle these interrupts, set smp_affinty to 2:

# echo 2 > /proc/irq/150/smp_affinity

Then look at /proc/interrupts again:

# cat /proc/interrupts
           CPU0       CPU1
150:      20967         12       GIC  enet

The CPU1 interrupt counter is ticking and the interrupt is now only served by CPU1.

Resources

Documentation/IRQ-affinity.txt

Linux memory overcommit

Linux memory overcommit

Linux is generous in terms of memory, it will almost never fail on requests from malloc(3) with friends. What does this mean in practice and how may it be a potential issue?

In short, overcommit memory means that the system will give the application so much memory it is asking for, even if the physical memory is not available. How may this work?
Well, the requested memory comes with one small restriction; the application is given as much memory it demands if it not going to use it. Seriously?
Yes, and it is pretty clever too.

The main purpose is to optimize memory handling by avoid swapping out memory as much as possible. The application does not _really_ need the memory before it is going to use it anyway (if it’s going to be used it at all), and it is not unlikely that an other application has freed memory before the allocated memory is used. A swap has been avoided.

Now, think about an embedded system without a swap area and with a limited amount of memory. Is memory overcommit still a good thing? It could be. It could also be a treacherous, unpredictable demon that haunts seemingly random devices.

In a case that the application uses a library that allocates tons of memory but never going to use it all, memory overcommit is pretty good because the application may not even start without it. A weird example? Not at all, let me just say three words; Qt with QML.

On the other hand, if the application really intend to use the memory we have a problem
It is even worse if the application only use the memory under specific circumstances that is hard to track down.

If a system is running out of memory, the unforgiving Out Of Memory- (OOM-) killer will terminate a (almost) random application in desperation.
This randomness makes it a little bit tricky. The victim may be your SSH server, your logging server, your application or whatever stands in the way of the OOM-killer.

The Linux kernel supports the following overcommit handling modes (refer to Documentation/vm/overcommit-accounting)

In practice

The overcommit policy is set via the sysctl vm.overcommit_memory or by writing to /proc/sys/vm/overcommit_memory.

For example:

echo 2 > /proc/sys/vm/overcommit_memory

Quickfix in VIM

Quickfix in VIM

One of the most fascinating things with the VIM editor is that you find new features every day, even after many years of heavy usage.

Quickfix is one of those features.

Quickfix parse your compiler-output and let you easy navigate to the concerned lines with errors and/or warnings. Great hue?

How does it work?

The builtin command is (surprisingly) make.
Just do
:make (without bang character!)

make will do whatever the makeprg variable is set to. By default makeprg is set to … make (as in GNU Make, not the builtin command).

For example, if you are programming ruby, you may set makeprg to "ruby -c %".

I compile the Linux kernel a few times a day, so I usually set the the variable like this:

set makeprg=make ARCH=arm CROSS_COMPILE=arm-none-linux-gnueabi- uImage

Ok, the stuff is compiling, now what?

Typical commands you use is::
:copen – opens quick fix window as a new buffer
:cnext – or :cn, jump to next error
:cprevious – or :cp, jump to previous error
:cclose – close the quick fix window

And the best of all:

:cwindow - or :cw, open quickfix window if there are errors and close it if there are not.

LDD without LDD

LDD without LDD

I often meet colleges at work who gets frustrated when they try to print the shared libraries dependencies for an ELF or library, and the ldd command is simply stripped out from target. (I do often strip targets 🙂 )

As if that would be a big problem.

The ldd command is not a binary executable, but a script that simple calls the runtime dynamic linker with a few environment variables set, and you may do the same!
The essential environment variable in this case is LD_TRACE_LOADED_OBJECTS, that should be set to something != 0.

In short, you may do:

LD_TRACE_LOADED_OBJECTS=1 /lib/ld-linux.so.2 ./my_application.

Even the –list option may be used, but does not work on all targets.:

/lib/ld-linux.so.2 --list ./my_application.

Example outputs with ldd:

[11:16:58]marcus@tuxie:/tmp/a$ ldd ./main linux-vdso.so.1 =>  (0x00007fffc8bff000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f564e254000) /lib64/ld-linux-x86-64.so.2 (0x00007f564e647000)

Example output without ldd:

[11:17:23]marcus@tuxie:/tmp/a$ LD_TRACE_LOADED_OBJECTS=1 /lib64/ld-linux-x86-64.so.2 ./main linux-vdso.so.1 =>  (0x00007fffeecbc000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcce7700000) /lib64/ld-linux-x86-64.so.2 (0x00007fcce7af3000)

So, do not get frustrated, be happy.