Linux Driver DevelopmentIntroduction & Setup
What is a driver? A miserable little pile of secrets. Today we'll perform the laborious task of setting up a linux kernel and module development environment, and write some simple drivers for it. If you have no prior exposure to these topics, this will probably take you around an hour, so brew some sludge and buckle up!Development Machine
To develop kernel modules, you have 2 main options:
- develop on your existing system with its headers
- develop in a virtual-machine (e.g. qemu, virtualbox, etc...)
I personally use virtualbox because it's easy, but the main risk of developing kernel drivers on your system is that a bug in one of your drivers will have catastrophic effects on your system.
The setup we will do can start off with either and assumes a 18.04 LTS ubuntu system, so get yourself set up now and continue reading.Linux Kernel Background
The Linux kernel is a "monlithic kernel" meaning practically that it has a massive API and (usually) massive kernel size given all it supports. Let's think back to a hypothetical class and think about what the operating system manages: multiprogramming, multithreading, multiuser, filesystems, drivers, memory management, and a lot more.
Unlike the UNIX kernel which is a top-down carefully designed kernel, the linux kernel is one that started out fairly small but kept growing more and more subsystems and had to reorganize to maintain coherency. So, aside from identifying disparate subsystems, it can't be said to have a specific design.
Formally speaking, the major subsystems of linux are:
Virtual File System (VFS) - Unified interface to access stored data
Memory Manager (MM) - The main magic trick here is giving the illusion a process has access to all memory using a combination of virtual memory and page table.
Process Scheduler - Manages CPU time, switch between contexts of processes
System Call Interface (SCI) - Takes arguments from user-space and execute a procedure in kernel space and then return to the original context
Device Drivers - How we interface with devices
Network Subsystem - Implementations of basically all low-level network protocols
Inter-Process Communication - SysV and POSIX are available, as well as more...
Each subsystem has interfaces between itself and user-space, as well as between the subsystem and a kernel module. User-space API and ABIs have a guarantee of stability but internal kernel-code to other-kernel-code (e.g. a module we'll write) has less stability guarantees and can change between minor kernel versions.
Despite this, pretty much any module code you write will contain or embed a kobject, which can be thought of as a generic data structure (similar to writing python in C with py_object's). Each kobject corresponds to a directory in a filesystem, in other words, the core kernel data structure is hierarchical. We're not going to do much with kobject's directly, but here's what it looks like for an idea of that internal structure:
struct kobject {
char* k_name; // name of container
char name[]; // name of the container, if it fits in 20 bytes
struct k_ref kref; // reference counter for the container
struct list_head entry; // list in which the kobject is inserted
struct kobject* parent; // pointer to the parent kobject, if any
struct kset* kset; // pointer to the containing kset
struct kobj_type* type; // pointer to the kobject type descriptor
struct dentry* dentry; // pointer to the dentry of the sysfs file associated with the kobject
}
Now let's move onto our actual setup.Dev Environment Setup
The first step of developing a kernel module is to target a specific kernel version -- the ABI between versions often has a lot of changes, since very few subsystems that I've seen claim to have a stable ABI. Heck, from version 5.4 to 5.15 I've noticed that the way defaults are specified in the kernel build system itself changed! So if you want to develop drivers, make sure you know what system you are targeting.
We're targetting kernel 5.4, and we want to control what options are enabled in our kernel since it affects what subsystems are potentially avaiable to the driver.
You can get the 5.4 kernel version from here or clone this massive git repo.
We can get a vague idea of where those vague operating system responsiblities we mentioned before before live by unzipping a version of linux kernel and looking at the base directories; here's a listing of the 5.14 kernel with some comments:
linux-5.15.65/
├── arch # architecture-specific code (cpu)
├── block # underlying (to VFS/FS) block I/O code path, implements page cache, generic block i/o, i/o scheduler, etc...
├── certs
├── crypto # kernel-level implementation of ciphers and kernel APIs to serve crypto consumers
├── Document ation
├── drivers
├── fs # implements the kernel ~Virtual Filesystem Switch~ (fs abstraction), and individual filesystem drivers e.g. ext4, nfs, ...
├── include # arch-independent kernel headers (arch dependent ones exist in arch/<cpu>/include/...)
├── init # arch-independent kernel initialization
├── ipc # inter-process communication code -- SysV, POSIX IPC and more
├── kernel # core kernel subsystem -- e.g. scheduling, locking, processes, timers, signaling, tracing, etc...
├── lib # a library for the kernel, note that the kernel does not support shared libraries like userspace
├── LICENSES
├── mm # most memory management code
├── net # complete impl of network protocol stack, and impl of tcp, udp, ip and more
├── samples
├── scripts # scripts for building and static analysis
├── security # impl of the Linux Security Module(LSM), a Mandatory Access Control(MAC) framework. SELinux is only one of the implementations
├── sound # Advanced Linux Sound Architecture (aka ALSA) subsystem
├── tools # mostly userspace apsp that have a tight coupling with the kernel, e.g. perf
├── usr
└── virt # virtualization code, implements KVM
Also note that when things run in kernel space, they are not in the same execution mode as user-space and thus can't be linked to your usual favorite libraries -- it's the other way around, you're producing a shared object for the kernel to dynamically link against.
After extracting the kernel source to a location, run export KSRC=/path/to/source
. It'll come in useful when referencing the kernel tree.
Configuring The Kernel
After getting the kernel source setup, we need to create a base config to work from, there are a couple approaches with configuring a kernel:
1) Base the config off an existing host system
2) Base it off a default config in the/a repo
3) Build the default kernel with make
Kernel configuration and building is done with a system called kbuild, which is a build system that manages dependencies between features in the version with strings like CONFIG_ABCD. It's a simple language that can express dependencies between different configuration items and whether enabled features are built as modules or built into the kernel, but we won't be needing to understand it too much for the purposes of just configuring the kernel. Just keep in mind that things like defaults, and "if (CONFIG_ABC) disabled" then "CONFIG_CDE can't be enabled" is handled by kbuild.
Without any other environments specified (like ARCH), the default architecture target will be the same as the host system.
Building a kernel by selecting every specific feature you care about is painstaking and laborious, and since you already have a target platform for the kernel modules, you can target this. If we ran make without any supplied arguments, we would actually be configuring with option (3), since the default config is tested and guaranteed to work, however, we'll be doing approach (1), a config similar to the host system using lsmod. Run the following steps:
lsmod > /tmp/lsmod.now
cd ${KSRC}
make LSMOD=/tmp/lsmod.now localmodconfig
The kernel will start running its kconfig
configuration steps, and will start prompting you yes or no for if a feature should be enabled. Just keep hitting enter to accept the defaults for now. At the end of this we will have a kconfig configuration in .config
of KSRC
to change. You can open it to look at the entries of form CONFIG_*
"but you never want to change this manually.
For debugging purposes, we will enable kernel address sanitizing so we can recover from buggy drivers. From $KSRC
run make menuconfig
to enter a menu, and use tui menu to find config item CONFIG_KASAN
. You can hit /
to enter a search prompt and find config items, to then navigate to. In this case, you'll find this option in Kernel Hacking->Memory Debugging->KASAN.
To make this configuration truly yours, go to General Setup->Local Version to put in a version string that gets appended to the version of the kernel built. This will help identify the kernel in a later step.
Now we're going to build this kernel with make all -j$(nproc)
. If you run make help
, you can see targets preceded with an asterisk \*, these are what all
builds for us. Of interest in particular are targets vmlinux (uncompressed kernel), modules(features marked as 'm' in the config) and bzImage(compressed kernel).
Once building is finished, we're actually going to install the kernel onto the system and boot from it for development. First let's install the modules:
sudo make modules_install
so thus the booted system can find its modules
Finally, let's install the kernel using sudo make install
. This step generates a new grub configuration file and initrd image(now called initramfs) and puts the comprressed kernel, initial ramdisk/filesystem, and grub config into the correcy locations in the system. (/boot/ for ramdisk&kimage and /etc/default/grub).
Now, so we boot into this kernel by default. Edit /etc/default/grub
to always show a boot prompt by making GRUB_TIMEOUT_QUIET=false
, GRUB_TIMEOUT=4
so you can interrupt it, and GRUB_TIMEOUT_STYLE=menu
. Now, run sudo update-grub
and reboot the system. When the grub menu pops up and you go into "advanced", you should be able to select the kernel you just built!
On Cross Compiling
Note that we could build for an arm platform by simply installing a cross-toolchain (e.g. sudo apt-get crossbuild-essential-armhf
and running make ARCH=arm CROSS_COMPILE=toolchain-prefix-
, but then the starting config becomes more important and tricky to find. There are existing configs in arch/arm/configs/
including for the raspberry pi 3 processor, bcm2709_defconfig
. Note that vendor support is going to be more ticky than copying main and usually they have their own kernel source tree branches and extra steps to actually get the kernel to boot.
Developing Kernel Modules
Now that we're running our safer kernel, let's install the headers we'll develop from. Run sudo apt-get install linux-headers-generic
. Let's skip more setup and get something we can load:Obligatory Hello-World Module
Make a new directory for the the following makefile and source file.
Put the following in helloworld.c
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
MODULE_AUTHOR("reader");
MODULE_DESCRIPTION("Trivial hello world module");
MODULE_LICENSE("Dual MIT/GPL");
MODULE_VERSION("1.0");
static int __init hello_init(void)
{
printk(KERN_INFO "Hello, kernel-space\n");
return 0; /* success */
}
static void __exit helloworld_lkm_exit(void)
{
printk(KERN_INFO "Goodbye, kernel-space :') \n");
}
module_init(helloworld_init);
module_exit(helloworld_exit);
and makefile
PWD := $(shell pwd)
obj-m += helloworld.o
all:
make -C /lib/modules/$!(shell uname -r)/build/ M=$!(PWD) modules
install:
make -C /lib/modules/$!(shell uname -r)/build/ M=$!(PWD) modules_install
clean:
make -C /lib/modules/$!(shell uname -r)/build/ M=$!(PWD) clean
Run make
and this will build the module with the current kernel's headers, which means that right after building, we can run sudo insmod helloworld.ko
to install the modules, sudo rmmod kelloworld.ko
to remove it, and then see our messages in dmesg
.
Now let's think about what's happening here:
The macros MODULE_* put information in the kernel module data segment so that we have some meta information from it e.g. try modinfo helloworld.ko
.
Like object-oriented constructs, our module has a constructor and destructor, that must have signatures static int __init (*fn)(void)
and static void __exit (*fn)(void)
respectively. We indicate an error from the constructor by returninig a nonzero value from it. And the module_init(...)
& module_exit(...)
register these functions.
From here the type of kernel module we'll develop is a character driver, which furnishes a device that acts like a file for the purposes of interfacing. Let's drop some general info that more boring posters would have put before that intro code.
Driver Background
There are two types of drivers in linux, character and block. Everything (usb, hard drives, microphones) fall into one of these two categories. Character drivers deal with streams of data possibly up to one character at a time, but block drivers deal with data being sent in blocks at a a time (e.g. a chunk of 512 bytes as a minimum transaction unit).
There is a saying that "everything in linux is a file, and if it's not a file, it's a process". In both cases of drivers, what you're defining is the implementation of reading and writing to a device. That is, when a driver is written to, the system calls for writing, opening, closing to the device are implemented in the driver, and it's very important to get it right because the kernel executes that code, not the user, so it has the privileges to read and write anywhere.
Block and character device drivers are defined by a major/minor number, which is a whole standard committee type affair where the major number denotes what type of driver it is, and the minor number has some implicit but flexible meaning that's mostly up to the developer.
Since it's easier to setup we're making a character driver for this tutorial.
A More Intersting Kernel Module
Within the two main categories, block and char, there are sub-types. For example, i2c, usb and mouse devices are sub-types of character drivers and hard drives and large memory devices are sub-types of block drivers. Linux furnishes us with frameworks to develop these sub-types so we don't have to re-implement basic communication methods like the i2c protocol.
We're going to implement a misc device, which is a character device with major number 10. And we're going to use it to encrypt input to read later.
rot13
Rot13 is a simple encryption technique, take a letter of the alphabet and shift it by 13 positions. Since the alphabet is 26 characters, applying rot13(rot13('a')) will return the original input. This lets us test if we've successfully developed a device. A simple C implementation for this would be:
// character mapping of rot13
unsigned char rot13_map(unsigned char a) {
if (a <= 'Z' && a >= 'A') {
a += 13;
if (a > 'Z')
a -= 26;
} else if (a <= 'z' && a >= 'a') {
a += 13;
if (a > 'z')
a -= 26;
}
return a;
}
So let's define a device interface around it.
First we need to include the standard linux module headers from before, and ones for memory management.
#include <linux/init.h>
#include <linux/module.h>
#include <linux/miscdevice.h>
#include <linux/slab.h> // k[m|z]alloc, k[z]free
#include <linux/fs.h> // file operations aka fops
#include <linux/uaccess.h> // provides copy_from/to_user() would be <asm/uaccess.h> for kernel version <= 4.11
The new functions we're using from these headers are:
// memory allocation
void* devm_kzalloc(struct device *dev, int bytes, int flags);
void* devm_kmalloc(struct device *dev, int bytes, int flags);
void* kmalloc(int bytes, int flags);
void* kzalloc(int bytes, int flags);
void kfree(void *mem);
// memory movement
copy_to_user(void *dst, void *src, int count);
copy_from_user(void *dst, void *src, int count);
We use copy_from/to_user
instead of a simple memcpy to prevent accidentally copying from another portion in kernel space, these are essentially safety functions that make it explicit what we are doing.
The memory functions kmalloc,kzalloc simply allocate memory from the kernel, while kfree frees them. A standard pattern would be to allocate required memory in the module init method and to free it in the exit function.
Note that we're using functions instead of our normal calloc, malloc, free because we don't have those normal functions in a kernel context -- the kernel does't link against standard libraries like userspace do, so our modules can't either. Instead we have to use helpers available in kernel headers or implement them ourselves.
Finally, devm_kzalloc, devm_kmalloc are more interesting allocation functions -- they allocate managed memory that is associated with a specific device, and when that device is no longer registered and nothing points to that data anymore, it is freed on its own. Note that there is a devm_kfree function, but if you're using it too much then you're probably using the wrong tool.
So now, let's add our module info and definition of rot13
MODULE_AUTHOR("Sergey Ivanov");
MODULE_DESCRIPTION("Simple rot13 driver that works like a pipe, each character written will be encrypted and available to re-read, lack of available characters results in EOF");
MODULE_LICENSE("Dual MIT/GPL");
MODULE_VERSION("0.1");
// character mapping of rot13
unsigned char rot13_map(unsigned char a) {
if (a <= 'Z' && a >= 'A') {
a += 13;
if (a > 'Z')
a -= 26;
} else if (a <= 'z' && a >= 'a') {
a += 13;
if (a > 'z')
a -= 26;
}
return a;
}
Note that we are defining this device to act like a pipe, that is input written in will be read out in a fifo manner.
And now we want to declare our file operations. Similar to python's pyobject system, we will put these into a structure to dereference when we perform file operations to the device.
// interface for to rot13 device
static int open_rot13(struct inode *inode, struct file *filp);
static int close_rot13(struct inode *inode, struct file *filp);
static ssize_t read_rot13(struct file *filp, char __user *ubuf, size_t count, loff_t *off);
static ssize_t write_rot13(struct file *filp, const char __user *ubuf, size_t count, loff_t *off);
static const struct file_operations rot13_fops = {
.open = open_rot13,
.read = read_rot13,
.write = write_rot13,
.release = close_rot13,
.llseek = no_llseek,
};
The struct file_operations contains more fields than the five we see, but this will allow us to interact with a device. Note that seeking is fairly straightforward to implement but in this context it doesn't make sense. When there is no seeking enabled in a device we want to explicitly assign no_lseek to the .llseek field and change the open function to return nonseekable_open(...)
, otherwise someone may try to seek the device file and get a weird error.
Now, you usually want to define a kernel module to assign memory for each attached device, but for simplicity we will operate on a file in static memory. The following struct iwll contain all our major info.
// define our data
static struct rot13_ctx {
struct device *dev;
int written;
int data_size;
char data[];
} *rot13_context;
#define INITIAL_DATA_SIZE 256
Now, using our miscdevice type, we can simplify initialization from register_chrdev_region
doc
static struct miscdevice rot13_miscdev = {
.minor = MISC_DYNAMIC_MINOR, // kernel dyanmically assigns a free minor
.name = "rot13", // misc_register() auto-creates /dev/rot13 and entries in sysfs
.mode = 0666,
.fops = &rot13_fops
};
// now define our methods
static int __init rot13_init(void) {
int ret = 0;
struct device *dev;
ret = misc_register(&rot13_miscdev);
if (ret) {
pr_notice("rot13 misc device registration failed, aborting\n");
return ret;
}dev = rot13_miscdev.this_device;
pr_info("rot13 registered, minor# = %d - dev node is /dev/%s\n",
rot13_miscdev.minor, rot13_miscdev.name);
// our static memory points to kernel managed device memory
rot13_context = devm_kzalloc(dev, sizeof(struct rot13_ctx) + INITIAL_DATA_SIZE, GFP_KERNEL);
if (unlikely(!rot13_context)) return -ENOMEM;
rot13_context->data_size = INITIAL_DATA_SIZE;
rot13_context->dev = dev;
dev_dbg(rot13_context->dev, "driver initialized");
return 0;
};
static void __exit rot13_exit(void) {
misc_deregister(&rot13_miscdev);
pr_info("rot13 driver deregistered\n");
}
In our init function, we initialize the device's memory. GFP_KERNEL
means that kmalloc can put the current process to sleep waiting for a page when called in low-memory situations
Not that unlikely()
is a passthrough function that hints to the compiler that a branch is unlikely to be taken.
Now let's implement our file operations:
static int open_rot13(struct inode *inode, struct file *filp) {
// if we allow seeking, use regular open
return nonseekable_open(inode, filp);
}
static int close_rot13(struct inode *inode, struct file *filp) {
return 0;
}
static ssize_t read_rot13(struct file *filp, char __user *ubuf, size_t count, loff_t *off) {
int ret = count;
if (count == 0 || rot13_context->written == 0) {
dev_warn(rot13_context->dev, "nothing to read\n");
ret = -EINVAL;
goto out;
}
if (count > rot13_context->written)
count = rot13_context->written;
if (copy_to_user(ubuf, rot13_context->data, count)) {
ret = -EFAULT;
goto out;
}
rot13_context->written -= count;
memmove(rot13_context->data, rot13_context->data + count, rot13_context->written);
out:
return ret;
}
static ssize_t write_rot13(struct file *filp, const char __user *ubuf, size_t count, loff_t *off) {
int ret = count;
char *start, *end; // for transforming
struct rot13_ctx *new_ctx; // for if we need to reallocate memory
// is our buffer big enough?
if (rot13_context->written + count > rot13_context->data_size) {
rot13_context->data_size *= 2;
pr_info("doubling rot13 buffer size to %d", rot13_context->data_size);
new_ctx = devm_kmalloc(rot13_context->dev, sizeof(struct rot13_ctx) + rot13_context->data_size, GFP_KERNEL);
if (unlikely(!new_ctx)) {
ret = -ENOMEM;
goto out;
}
memcpy(new_ctx, rot13_context, sizeof(struct rot13_ctx) + rot13_context->written);
devm_kfree(rot13_context->dev, rot13_context);
rot13_context = new_ctx;
}
// copy from userspace memory into kernel memory
if (copy_from_user(rot13_context->data + rot13_context->written, ubuf, count)) {
ret = -EFAULT;
goto out;
}
// if we were doing real encryption, we'd do it on input so memory would be obfuscated
start = rot13_context->data + rot13_context->written;
rot13_context->written += count;
end = rot13_context->data + rot13_context->written;
while (start < end) {
*start = rot13_map(*start);
++start;
}
out:
return ret;
}
module_init(rot13_init);
module_exit(rot13_exit);
Note that when we return an error, we return a negative version of it as convention -- the user's space program will have its errno set to this value. This is because the return value can encode addresses using ERR_PTR()
and PTR_ERR()
macros.
Our makefile will be almost the same as our helloworld one.
PWD := $(shell pwd)
ccflags-y := -std=gnu99
obj-m += rot13_v1.o
all:
make -C /lib/modules/$!(shell uname -r)/build/ M=$!(PWD) modules
install:
make -C /lib/modules/$!(shell uname -r)/build/ M=$!(PWD) modules_install
clean:
make -C /lib/modules/$!(shell uname -r)/build/ M=$!(PWD) clean
Now build the module and run insmod rot13_v1.ko
and you'll see a device /dev/rot13
. You can write to it with echo "secret phrase" > /dev/rot13
and then read from it simply with cat /dev/rot13
.Interfacing with sysfs, tarea
Let's interface with a subsystem: sysfs! The files that show up in /sys/ and let us set up configurable values. We'll use this to control size of the file we're writing to.
We're also going to change how writing to/from the module works! Instead of a pipe, we'll treat it like a random-access file. We're going to expose our varaibles directly and thus have to protect from multiple simultaneous reads/writes from user-space with mutexes.
We have a couple new headers in the start of our file.
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/miscdevice.h>
#include <linux/slab.h> // k[m|z]alloc, k[z]free
#include <linux/fs.h> // file operations aka fops
#include <linux/mutex.h> // multi process operations
#include <linux/uaccess.h> // would be <asm/uaccess.h> for kernel version <= 4.11
Our module info stays the same roughly
MODULE_AUTHOR("Sergey Ivanov");
MODULE_DESCRIPTION("Simple rot13 driver that works like an infinite file, demonstrating sysfs and seeking");
MODULE_LICENSE("Dual MIT/GPL");
MODULE_VERSION("0.1");
DEFINE_MUTEX(rot13_mtx);
Our struct has some new members, and we're not using the flexible array member anymore, so we'll change how we realloc:
// reorganized to start with our data
static struct rot13_ctx {
struct device *dev;
int rx, tx;
int data_size;
#define INITIAL_DATA_SIZE 256
char *data;
} *rot13_context;
// resize the data component of a struct rot13_ctx, allowing for uninitialized
int resize_rot13_data(struct rot13_ctx *p, size_t new_data_size){
if (p->data == NULL) {
p->data = kzalloc(new_data_size, GFP_KERNEL);
if (!unlikely(p->data)) return -ENOMEM;
p->data_size = new_data_size;
} else {
void* newbuf = kzalloc(new_data_size, GFP_KERNEL);
if (!unlikely(newbuf)) return -ENOMEM;
memcpy(newbuf, p->data, (new_data_size > p->data_size ? p->data_size : new_data_size));
kfree(p->data);
p->data = newbuf;
p->data_size = new_data_size;
}
return 0;
}
We're going to be using the macro DEVICE_ATTR_[RW/RO], which simultaneously creates a struct device_attribute
and assigns its members based on name.
// ---- sysfs handlers ----
static ssize_t rot13_data_size_show(struct device *dev, struct device_attribute *attr, char *buf);
static ssize_t rot13_data_size_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count);
static ssize_t rot13_rx_show(struct device *dev, struct device_attribute *attr, char *buf);
static ssize_t rot13_tx_show(struct device *dev, struct device_attribute *attr, char *buf);
// these macros instantiate a struct device_attribute dev_attr_<param>
here
// the name of the r/w callback function are <name>_show, and <name>_store
static DEVICE_ATTR_RO(rot13_rx);
static DEVICE_ATTR_RO(rot13_tx);
static DEVICE_ATTR_RW(rot13_data_size);
A typo here can cost some developer time, so make sure you double-check your names.
Our file interface now contains a seek operation.
// interface for to rot13 device
static int open_rot13(struct inode *inode, struct file *filp);
static int close_rot13(struct inode *inode, struct file *filp);
static ssize_t read_rot13(struct file *filp, char __user *ubuf, size_t count, loff_t *off);
static ssize_t write_rot13(struct file *filp, const char __user *ubuf, size_t count, loff_t *off);
static loff_t llseek_rot13(struct file *filp, loff_t, int);
static const struct file_operations rot13_fops = {
.open = open_rot13,
.read = read_rot13,
.write = write_rot13,
.release = close_rot13,
.llseek = llseek_rot13,
};
and our initial struct stays the same
static struct miscdevice rot13_miscdev = {
.minor = MISC_DYNAMIC_MINOR, // kernel dyanmically assigns a free minor
.name = "rot13", // misc_register() auto-creates /dev/rot13 and entries in sysfs
.mode = 0666,
.fops = &rot13_fops
};
Now in the init portion of the function we use some sysfs initialization functions that have to be cleared up later. They are analogous to our register function but for sysfs nodes. The pattern of registering/deregsitering should be fairly clear by now.
// now define our methods
static int __init rot13_init(void) {
int ret = 0;
struct device *dev;
ret = misc_register(&rot13_miscdev);
if (ret) {
pr_notice("rot13 misc device registration failed, aborting\n");
goto out;
}
dev = rot13_miscdev.this_device;
pr_info("rot13 registered, minor# = %d - dev node is /dev/%s\n",
rot13_miscdev.minor, rot13_miscdev.name);
// our static memory points to kernel managed device memory
rot13_context = devm_kzalloc(dev, sizeof(struct rot13_ctx), GFP_KERNEL);
rot13_context->dev = dev;
if (!unlikely(rot13_context)) // init struct
return -ENOMEM;
if (unlikely(resize_rot13_data(rot13_context, INITIAL_DATA_SIZE))) // init payload
return -ENOMEM;
if (IS_ENABLED(CONFIG_SYSFS)) {
#define test_failure_goto(t, label) if ((t)) \
{ pr_info("device_create_file failed (%d), aborting\n", (t)); goto label; }
ret = device_create_file(dev, &dev_attr_rot13_rx);
test_failure_goto(ret, o3);
ret = device_create_file(dev, &dev_attr_rot13_tx);
test_failure_goto(ret, o2);
ret = device_create_file(dev, &dev_attr_rot13_data_size);
test_failure_goto(ret, o1);
} else {
pr_info("no sysfs accessible to rot13");
}
dev_dbg(dev, "driver initialized");
o1: device_remove_file(dev, &dev_attr_rot13_data_size);
o2: device_remove_file(dev, &dev_attr_rot13_tx);
o3: misc_deregister(&rot13_miscdev);
out:
return ret;
};
static void __exit rot13_exit(void) {
kfree(rot13_context->data);
// clean up sysfs nodes
device_remove_file(rot13_context->dev, &dev_attr_rot13_data_size);
device_remove_file(rot13_context->dev, &dev_attr_rot13_tx);
device_remove_file(rot13_context->dev, &dev_attr_rot13_rx);
misc_deregister(&rot13_miscdev);
pr_info("rot13 driver deregistered\n");
}
Note that the use of goto's for failure cases lets us simplify our exit logic, so I consider it accessible, but for anything else it should probably be avoided.
Our open/close remains almost the same.
static int open_rot13(struct inode *inode, struct file *filp) {
// if we don't want to implement seeking, we'd return nonseekable_open(inode, filp) here
return 0;
}
static int close_rot13(struct inode *inode, struct file *filp) {
return 0;
}
Now, it's finally time to explain the parameters in our filesystem signatures -- inode, filp, off
An inode represents a file from the point of view of the file system. Attributes of an inode are the size, rights, times associated with the file. An inode uniquely identifies a file in a file system. Inode has no state, it's like the class to a file's object. inode is used to determine the major and minor of the device on which the operation is performed, and the file is used to determine the flags with which the file was opened, but also to save and access (later) private data. The inode structure contains, among many information, an i_cdev field, which is a pointer to the structure that defines the character device (when the inode corresponds to a character device).
The filp parameter means "file pointer" and the struct file
contains state like mode the file os open in, the position in a file, flags the file was opened in, and a special member private_data, where we could potentially allocate and store our rot13 context to in order to allow for more than just one static copy. Here is an annotated set of member of *filp.
struct file{
....
f_mode, //which specifies read (FMODE_READ) or write (FMODE_WRITE);
f_flags, //which specifies the file opening flags (O_RDONLY, O_NONBLOCK, O_SYNC, O_APPEND, O_TRUNC, etc.);
f_op, //which specifies the operations associated with the file (pointer to the file_operations structure );
private_data, //a pointer that can be used by the programmer to store device-specific data; The pointer will be initialized to a memory location assigned by the programmer.
f_pos, //the offset within the file
...
};
loff_t *off is the offset in the file we're writing, which points to the same value as filp->f_pos
Now, knowing this, let's define our read, write and seek.
static ssize_t read_rot13(struct file *filp, char __user *ubuf, size_t count, loff_t *off) {
int ret = count;
if (mutex_lock_interruptible(&rot13_mtx))
return -ERESTARTSYS;
if (count == 0 || *off >= rot13_context->data_size) {
dev_warn(rot13_context->dev, "nothing to read\n");
ret = -EINVAL;
goto out;
}
if (count + *off > rot13_context->data_size) {
count = rot13_context->data_size - *off;
}
if (copy_to_user(ubuf, rot13_context->data + *off, count)) {
ret = -EFAULT;
goto out;
}rot13_context->rx -= count;
out:
mutex_unlock(&rot13_mtx);
return ret;
}
static ssize_t write_rot13(struct file *filp, const char __user *ubuf, size_t count, loff_t *off) {
int ret = count;
char *start, *end; // for transforming
if (mutex_lock_interruptible(&rot13_mtx))
return -ERESTARTSYS;
if (count == 0 || *off >= rot13_context->data_size) {
dev_warn(rot13_context->dev, "nothing to write\n");
ret = -EINVAL;
goto out;
}
// is our buffer big enough?
if (*off + count > rot13_context->data_size) {
count = rot13_context->data_size - *off;
}
// copy from userspace memory into kernel memory
if (copy_from_user(rot13_context->data + *off, ubuf, count)) {
ret = -EFAULT;
goto out;
}
// if we were doing real encryption, we'd do it on input so memory would be obfuscated
start = rot13_context->data + *off;
end = rot13_context->data + *off + count;
while (start < end) {
*start = rot13_map(*start);
++start;
}
out:
mutex_unlock(&rot13_mtx);
return ret;
}
static loff_t llseek_rot13(struct file *filp, loff_t offset, int whence) {
loff_t newpos;
// todo: mention how we'd actually use *filp to track this and store data
switch(whence) {
case 0: /* SEEK_SET */
newpos = offset;
break;
case 1: /* SEEK_CUR */
newpos = filp->f_pos + offset;
break;
case 2: /* SEEK_END */
newpos = filp->f_pos + offset;
break;
default: /* can't happen */
return -EINVAL;
}
if (newpos<0) return -EINVAL;
filp->f_pos = newpos;
return newpos;
}
You can see our mutex lock and unlock logic is fairly straightforward, but every exit path needs to unlock, so we consoldidate that in our "out" labels.
Finally, let's define our device attribute show/set methods.
// ---- definition of sysfs show/set methods ----
static ssize_t rot13_data_size_show(struct device *dev, struct device_attribute *attr, char *buf) {
int n;
if (mutex_lock_interruptible(&rot13_mtx)) return -ERESTARTSYS;
n = rot13_context->data_size;
mutex_unlock(&rot13_mtx);
return n;
}
static ssize_t rot13_data_size_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) {
int ret = count, new_size;
if (mutex_lock_interruptible(&rot13_mtx)) return -ERESTARTSYS;
if (count == 0 || count > 12) return -EINVAL;
ret = kstrtoint(buf, 0, &new_size);
if (ret) goto out;
if (resize_rot13_data(rot13_context, new_size)) {
pr_info("error resizing rot13 buffer to %d\n", new_size);
ret = -ENOMEM;
goto out;
}
out:
mutex_unlock(&rot13_mtx);
return ret;
}
static ssize_t rot13_rx_show(struct device *dev, struct device_attribute *attr, char *buf) {
int n;
if (mutex_lock_interruptible(&rot13_mtx)) return -ERESTARTSYS;
n = rot13_context->rx;
mutex_unlock(&rot13_mtx);
return n;
}
static ssize_t rot13_tx_show(struct device *dev, struct device_attribute *attr, char *buf) {
int n;
if (mutex_lock_interruptible(&rot13_mtx)) return -ERESTARTSYS;
n = rot13_context->tx;
mutex_unlock(&rot13_mtx);
return n;
}
module_init(rot13_init);
module_exit(rot13_exit);
In the same makefile, simply change the obj_m line to obj-m += rot13_v1.o rot13_v2.o
and run make
again. Make sure to rmmod
the old rot13 driver before you insmod v2.
Performing the same write and read operations, you will notice the file no longer behaves like a fifo. As an experiment, simply try seeking to the second word in "hello world" and read the rest.
Now look into /sys/class/misc/rot13/ and you'll see... nothing??
Despite all this effort, this is a simple defect in the syfs exposure of our attributes that relates to kobjects. This is an "exercise" for the user, but we will cover it when we resume device driver development in the next post!
code available on gitlab