Minimalistic Linux threading

Embedded system development always poses certain challenges and one of them is the threading support. Including POSIX threads (pthread) is not an option sometimes, but fortunately Linux kernel allows to make LWP (light-weight processes) which are similar to threads. The key function here is the clone function. fork internally uses clone too, but clone allows to create child processes with the different settings including CLONE_VM (share the memory between parent and children processes) which is essential for threading.

Without further ado let me give an example, and I'll describe key pieces below:

//Linux light-weight processes usage example
#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>

// Default thread number
#define THREAD_NUM 5

int message(const char *format, ...) {
    va_list arg_list;

    va_start(arg_list, format);
    // Unbuffered output to avoid the threading artefacts
    vfprintf(stderr, format, arg_list);

// The child thread will execute this function
int thread_function(void* argument) {
    pid_t tid = (pid_t)syscall(SYS_gettid);
    message("Thread number %d (TID: %d) has been called\n",
            (int)argument, tid);
    return 0;

int main() {
    struct rlimit rlim;
    void* stack_list[THREAD_NUM];
    pid_t tid_list[THREAD_NUM];
    int i = 0, stack_size;

    int ret = getrlimit(RLIMIT_STACK, &rlim);
    if (ret == -1) {
            perror("getrlimit: could not get stack size limit");
    stack_size = rlim.rlim_cur;
    message("Stack size limit: %d bytes\n", stack_size);

    for (; i < THREAD_NUM; ++i) {
        void* stack;
        pid_t tid;

        // Allocate the stack
        stack = malloc(stack_size);
        if (stack == 0) {
            perror("malloc: could not allocate stack");

        // Call the clone syscall to create the child thread
        tid = clone(&thread_function,
                    (char*) stack + stack_size,
                    SIGCHLD | CLONE_SIGHAND | CLONE_VM,
        message("%d-th child thread has been created (TID: %d)\n", i, tid);
        if (tid == -1) {

        // Save the thread information
        tid_list[i] = tid;
        stack_list[i] = stack;

    // Clean-up phase
    for (i = 0; i < THREAD_NUM; ++i) {
        pid_t pid;

        // Wait for the child thread to exit
        pid = waitpid(tid_list[i], 0, 0);
        if (pid == -1) {

        // Free the stack

    return 0;

First, we're getting stack size for the children. You can use any hardcoded value (if for example, getrlimit doesn't work correctly for your version of the kernel).

Second, we're calling clone with SIGCHLD | CLONE_SIGHAND | CLONE_VM flags. SIGCHLD is used for later waitpid call, if you don't want/need to wait for the children, you can omit these flags (however, you have to somehow free the allocated memory after the children exit).

One more important function here is gettid (used as syscall(SYS_gettid)) - it allows the child to get its thread id. It's the same as the clone function result, and can be used for more complicated thread management.

Of course, the effective thread management requires much more functionality (locking, synchronizing, atomic operations etc) and it's much easier to use existing thread frameworks like POSIX threads, however, in certain cases it's impossible due to platform limitations, but clone is always available in the Linux kernel and can lend a hand if needed.


Popular posts from this blog

DIY: Business cards in LaTeX

Python vs JS vs PHP for embedded systems

Shellcode detection using libemu