Skip to main content

Porting btrfs-progs to Rust

·18 mins

Last weekend, I was itching to write some code. But finding a good project can be difficult. What I try to look for is something challenging enough to learn from, yet self-contained so I know what success looks like.

The idea that came to mind was porting btrfs-progs to Rust. For the uninitiated, btrfs is a copy-on-write (CoW) filesystem in Linux popular enough that Fedora has used it as the default filesystem since Fedora 33.

Btrfs has several neat features. If you are familiar with ZFS, you may recognize that some of their feature sets overlap. In my mind, Btrfs is like ZFS, but for everyday usage.

  • Copy-on-write: Writes are never in-place. Modified data goes to new blocks while old blocks remain intact. This enables cheap snapshots and atomic operations. You can take atomic snapshots of your entire filesystem and back them up, even incrementally (with btrfs send / btrfs receive).
  • Integrity: Btrfs checksums all data and metadata, so it can detect silent data corruption (bit rot). When combined with redundancy, it can automatically repair corrupted blocks.
  • Subvolumes: It supports subvolumes, which are lightweight, independently snapshottable directory trees that share the same underlying storage.
  • Multi-device: Btrfs filesystems can span multiple devices, which you can add and remove on-the-fly. It has support for different data redundancy profiles built in (such as single, RAID0, RAID1, RAID10), so you don’t need to use things like LVM.
  • Compression: Built-in transparent compression and deduplication.
  • Online maintenance: You can defragment and resize a btrfs filesystem while it is mounted and in use.

Since btrfs has capabilities that go beyond traditional filesystems, it comes with a userspace utility called btrfs. This command-line tool lets you interact with the features that are specific to it.

# Take a read-only snapshot of the root filesystem
btrfs subvolume snapshot -r / /snapshot

# Write out the entire snapshot to the file `snapshot.bin`
btrfs send /snapshot > snapshot.bin

This tool is part of btrfs-progs, and is unsurprisingly written in C. The tools work well and don’t have significant attack surface (they’re not exposed to the network or anything), so there’s no pressing need to rewrite them in a memory-safe language. But I wanted to do it anyway: it would help me understand how these tools actually work, and I wanted to see if I could create a simpler implementation that might be easier to maintain and test.

Making a plan #

Before starting this rewrite, I put some thought into how I wanted to approach this. When doing a rewrite of an existing codebase, there are two strategies: a “clean-room” rewrite, where you look only at the interface and effects of the tool but not its code, or a source-informed rewrite, where you study and translate the original code directly. The advantage of a clean-room approach is that your rewritten code is original work that you can license however you want. I chose the latter approach. I thought it would be easier if I actually studied the existing code. However, that means my rewrite needs to carry the same license as the original, since studying the code would likely influence the outcome, making it a derived work.

I also decided to explore how useful an LLM could be for automating tedious tasks. I’m somewhat ambivalent about LLM usage: I’ve had bad experiences where they produced low-quality, incorrect code. At the same time, LLMs can be genuinely helpful for mechanical tasks, such as translating CLI command structures into clap declarations. I wanted to see if I could find the sweet spot of maintaining control of the architecture and quality of the codebase, while accelerating the process.

A first approach #

Initially, I wanted to understand how the original btrfs-progs codebase was organized. What’s the architecture? Does it use loose coupling with tidy, separated modules that I could translate individually to Rust and tie together using FFI? How does the compilation process work? How are the tools tested?

I started by cloning the repository and browsing around to understand what the pieces are and how they fit together. Here’s what I understood about the structure:

PathDescription
kernel-lib/Low-level data structures and algorithms extracted from the Linux kernel, intended for reuse in userspace tools.
kernel-shared/Btrfs-specific kernel code synchronized with the Linux kernel’s btrfs implementation. It implements core btrfs algorithms and on-disk format handling.
libbtrfs/Library for interacting with btrfs filesystems (also exposed as a Python module).
common/Shared utility code for btrfs tools (parsing, formatting, device scanning, filesystem utilities, etc.).
libbtrfsutil/Higher-level library for managing btrfs filesystems with official Python bindings (subvolume, filesystem, and qgroup operations).
cmds/Implementation code for the btrfs utility.
tune/Implementation code for the btrfstune utility.
mkfs/Implementation code for the mkfs.btrfs utility.

My initial plan was to keep this structure and port each piece individually to Rust, starting with kernel-lib and kernel-shared. I thought I could use bindgen to create bindings and then implement libbtrfs in Rust, eventually re-implementing the commands from cmds.

However, this approach wasn’t practical. First, there would be a lot of interdependencies, so I wouldn’t have a working MVP command for quite some time. Second, much of the code in kernel-lib and kernel-shared consisted of data structure implementations (linked lists, red-black trees, etc.) that Rust’s standard library already provides.

Second approach #

My alternative idea was to scrap the existing structure and build something new in Rust from scratch. To do this, I needed to understand how the btrfs CLI actually communicates with the kernel. The answer is mostly ioctl calls.

What are ioctl calls? #

Linux exposes certain operations when you open a file. You can read from it, write to it, and so on. But in Linux, and in UNIX-derivatives in general, files can represent anything, not just physical files holding data. Files can also represent devices, for example. In order to control the device, such as setting parameters (think setting the speed and parity bits on a UART), you need some kind of way to communicate with the driver. That is what ioctl does. It takes a file descriptor, an indication of what you want to communicate, and some data. If you want to, you can think of it as an RPC call to the kernel driver. All you need to know is the ioctl number and what shape of arguments the driver expects.

How the btrfs CLI uses ioctls #

When a file lives on a btrfs filesystem, the ioctl syscall lets you send custom commands to the btrfs filesystem driver. For example, you can call the BTRFS_IOC_SNAP_CREATE ioctl to create a snapshot of the subvolume that the file descriptor refers to. There is also /dev/btrfs-control, a control device for operations that don’t require a mounted filesystem, like device scanning. The kernel also exposes some filesystem state through /sys/fs/btrfs.

So, the way the btrfs utility works is by figuring out what it is asked to do (by parsing its invocation arguments), making the appropriate ioctl calls to the kernel, and printing data.

Some of the subcommands of the btrfs utility can parse unmounted, raw btrfs filesystems and either read data from them or write to them, but I decided that I would not implement this yet, and only focus on working with mounted filesystems (that is the majority use-case I have for the utility, anyway).

Structuring the codebase #

All the ioctl calls that btrfs supports, along with their argument structs, are defined in the kernel’s uapi headers: include/uapi/linux/btrfs.h and include/uapi/linux/btrfs_tree.h. So, I created a btrfs-uapi crate that uses bindgen to expose these ioctl calls and structs, with safe Rust wrappers around them.

I also created a btrfs-cli crate to house the CLI. Its job is to host the actual CLI tool, which parses arguments and defers the actual work to the safe wrappers in btrfs-uapi.

Implementation #

With this second approach, I made solid progress. I was able to come up with a simple prototype by wrapping some ioctl calls, and implementing a single subcommand, and verified that it worked. Then all I had to do was incrementally get each of the other subcommands working.

I was able to look at the source code for the commands, which is in cmds inside the repository to figure out which ioctl calls needed to be invoked for each subcommand. LLMs were also helpful for figuring this out.

Wrapping ioctl calls #

To give you an idea of how this works, let me walk through one specific ioctl and explain how it’s wrapped. The idea behind the wrapping is that what we get from the headers is just information about which ioctl calls exist and what arguments they take. But what I needed to implement the CLI is nice, safe Rust wrappers. For example, many ioctl calls shared the same argument structure but would populate different fields, so when I wrapped them I made sure that my wrappers only take arguments that actually need to be populated, and that they are somewhat documented.

The journey starts in the kernel headers (specifically include/uapi/linux/btrfs.h), which define all available btrfs ioctl calls. Here’s what they look like (this is just a single struct definition):

/* report balance progress to userspace */
struct btrfs_balance_progress {
	__u64 expected;		/* estimated # of chunks that will be
				 * relocated to fulfill the request */
	__u64 considered;	/* # of chunks we have considered so far */
	__u64 completed;	/* # of chunks relocated so far */
};

struct btrfs_ioctl_balance_args {
	__u64 flags;				/* in/out */
	__u64 state;				/* out */

	struct btrfs_balance_args data;		/* in/out */
	struct btrfs_balance_args meta;		/* in/out */
	struct btrfs_balance_args sys;		/* in/out */

	struct btrfs_balance_progress stat;	/* out */

	__u64 unused[72];			/* pad to 1k */
};

#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 34, \
					struct btrfs_ioctl_balance_args)

First, there are struct definitions (I’ve omitted the definition for btrfs_balance_args for brevity). Second, there’s the BTRFS_IOC_BALANCE_PROGRESS definition.

Every ioctl belongs to a group (all btrfs-related ioctls use the group BTRFS_IOCTL_MAGIC), and each has a command number within that group. This ioctl has command number 34. The _IOR macro indicates that this ioctl reads from the kernel (returns data). Some ioctl calls write to the kernel (_IOW), and some do both (_IOWR). The final argument is the type of the struct that will be passed to the ioctl. The macro uses it to encode the struct’s size into the ioctl number, and the generated function takes a pointer to that struct (this is where the kernel leaves the data that we want to read).

bindgen translates the structs to Rust for us, but it doesn’t translate the #define macros. Instead, we wrap these using the nix crate (which, despite the name, has nothing to do with Nix the package manager, it refers to Unix/POSIX systems). The wrapping looks like this:

nix::ioctl_read!(
    btrfs_ioc_balance_progress,
    BTRFS_IOCTL_MAGIC,
    34,
    btrfs_ioctl_balance_args
);

This macro generates an unsafe function that looks roughly like (you can see it here):

unsafe fn btrfs_ioc_balance_progress(
    fd: c_int,
    arg: *mut btrfs_ioctl_balance_args,
) -> nix::Result<i32>;

Great! Now we can call this ioctl from Rust. It’s not perfect (we still have to use unsafe, and accessing the auto-translated struct isn’t always straightforward), but it’s a start.

Safe wrappers #

Next, we write a safe Rust wrapper around the raw ioctl. Since this is a read ioctl (it reads data from the kernel), the wrapper doesn’t take arguments beyond a file descriptor. We need to understand what the kernel will return into the struct, create proper Rust types for each field, and populate them from the raw struct contents:

bitflags! {
    /// State flags returned by the kernel (`btrfs_ioctl_balance_args.state`).
    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
    pub struct BalanceState: u64 {
        /// A balance is currently running.
        const RUNNING    = BTRFS_BALANCE_STATE_RUNNING as u64;
        /// A pause has been requested.
        const PAUSE_REQ  = BTRFS_BALANCE_STATE_PAUSE_REQ as u64;
        /// A cancellation has been requested.
        const CANCEL_REQ = BTRFS_BALANCE_STATE_CANCEL_REQ as u64;
    }
}

/// Progress counters returned by the kernel for an in-progress balance.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct BalanceProgress {
    /// Estimated number of chunks that will be relocated.
    pub expected: u64,
    /// Number of chunks considered so far.
    pub considered: u64,
    /// Number of chunks relocated so far.
    pub completed: u64,
}

/// Query the current balance state and progress on the filesystem referred to by `fd`.
///
/// Returns a [`BalanceState`] bitflags value indicating whether a balance is
/// running, paused, or being cancelled, along with a [`BalanceProgress`]
/// containing the current counters.
pub fn balance_progress(fd: BorrowedFd) -> nix::Result<(BalanceState, BalanceProgress)> {
    let mut args: btrfs_ioctl_balance_args = unsafe { std::mem::zeroed() };

    unsafe {
        btrfs_ioc_balance_progress(fd.as_raw_fd(), &mut args)?;
    }

    let state = BalanceState::from_bits_truncate(args.state);
    let progress = BalanceProgress {
        expected: args.stat.expected,
        considered: args.stat.considered,
        completed: args.stat.completed,
    };

    Ok((state, progress))
}

Now we’re in a place where we can easily call balance_progress on a file descriptor and get useful data back.

The btrfs-uapi crate is structured so that all raw bindings (generated from C code and manually wrapped ioctls using the nix crate) are in uapi/src/raw.rs. Safe wrappers for each group of ioctl calls are in appropriately named modules. This particular ioctl, for example, is in uapi/src/balance.rs. That way, it’s easy to find what you’re looking for.

Bindgen gotchas #

bindgen does a good job translating C structs to Rust, but there are two subtle issues worth knowing about.

The first is flexible array members. Some btrfs structs contain a trailing array of variable length. bindgen represents the trailing spaces[] field using a special __IncompleteArrayField<T> type that has zero size. To actually use it you need to allocate a larger buffer manually and use the as_slice(n) method it provides.

The second gotcha is packed structs. Some btrfs structs are declared with __attribute__((packed)) in C, which means the compiler will not insert padding between fields. bindgen faithfully replicates this, but Rust has a rule: you cannot take a reference to a field of a packed struct because the field may not be aligned, and misaligned references are undefined behaviour. The fix is to copy the field value to a local variable first.

This trips up anyone who tries to derive Debug on a type containing a packed struct, since derive(Debug) generates references to each field internally.

Implementing the CLI #

In the CLI crate, I use clap to define command-line parsing declaratively. I really love this crate: it makes it easy to build consistent CLI tools with good error reporting.

/// Show status of running or paused balance operation.
#[derive(Parser, Debug)]
pub struct BalanceStatusCommand {
    pub path: PathBuf,
}

/// Open a path as a read-only file descriptor, suitable for passing to ioctls.
fn open_path(path: &PathBuf) -> Result<File> {
    File::open(path).with_context(|| format!("failed to open '{}'", path.display()))
}

impl Runnable for BalanceStatusCommand {
    fn run(&self, _format: Format, _dry_run: bool) -> Result<()> {
        let file = open_path(&self.path)?;

        match balance_progress(file.as_fd()) {
            Ok((state, progress)) => {
                if state.contains(BalanceState::RUNNING) {
                    print!("Balance on '{}' is running", self.path.display());
                    if state.contains(BalanceState::CANCEL_REQ) {
                        println!(", cancel requested");
                    } else if state.contains(BalanceState::PAUSE_REQ) {
                        println!(", pause requested");
                    } else {
                        println!();
                    }
                } else {
                    println!("Balance on '{}' is paused", self.path.display());
                }

                let pct_left = if progress.expected > 0 {
                    100.0 * (1.0 - progress.completed as f64 / progress.expected as f64)
                } else {
                    0.0
                };

                println!(
                    "{} out of about {} chunks balanced ({} considered), {:3.0}% left",
                    progress.completed, progress.expected, progress.considered, pct_left
                );

                Ok(())
            }
            Err(e) if e == Errno::ENOTCONN => {
                println!("No balance found on '{}'", self.path.display());
                Ok(())
            }
            Err(e) => Err(e)
                .with_context(|| format!("balance status on '{}' failed", self.path.display())),
        }
    }
}

The CLI crate is structured so that there is one module per top-level subcommand group. For example, the parent btrfs balance subcommand lives in cli/src/balance.rs, and the subcommands it has live in cli/src/balance/, with one file per subcommand. That way, you can easily find where a CLI invocation is implemented in the codebase.

I found LLMs useful for a lot of the mechanical work in this section: generating the nix wrappers for raw ioctls, producing safe wrappers, and translating CLI command structures into clap declarations. The key was breaking tasks into small, scoped chunks and providing feedback quickly. There were a few occasions where they did something incorrect, but that didn’t happen often.

Figuring out how to test it #

The last (and perhaps most important) aspect of porting this code is figuring out how to test it. What makes it difficult is that our tests interact with the kernel (the codebase works by issuing ioctl syscalls), so they can only be run on Linux (ideally, with a current kernel version). Also, a lot of these APIs require elevated privileges, so they must be run as root. Finally, a lot of CI systems use Docker containers to run, which by default block many syscalls, including the ioctl calls that we need to test.

Some other considerations are making sure the code is tested on different architectures (arm64), and also making sure that the musl build works as expected (musl has some differences compared to glibc).

Unit tests #

Unit tests don’t need the kernel at all. The most important ones verify that the Rust struct layouts produced by bindgen match the sizes asserted in the C headers. Every btrfs ioctl struct has a corresponding _static_assert(sizeof(...) == N) in the C header, and I mirror those in Rust:

#[test]
fn assert_struct_sizes() {
    assert_eq!(size_of::<btrfs_ioctl_vol_args>(), 4096);
    assert_eq!(size_of::<btrfs_ioctl_balance_args>(), 1024);
    assert_eq!(size_of::<btrfs_scrub_progress>(), 120);
    // ...
}

If bindgen or the kernel headers ever produce the wrong layout, this test catches it immediately without needing a real filesystem. Technically, we don’t need to write these, because bindgen has compile-time assertions in the bindings it generates that verify that the struct sizes match what it thinks they should be, but we have these anyway to ensure that the struct sizes match what the C source code asserts them to be (because, what if bindgen computes their size incorrectly?).

Integration tests #

Integration tests do need the kernel, and they also need root privileges (or at least CAP_SYS_ADMIN) to call btrfs ioctls. The challenge here is that you don’t want to run sudo cargo test, because then compilation happens as root and your target/ folder ends up owned by root.

I tried using an approach where I build the tests, then run setcap to set the CAP_SYS_ADMIN on the test binaries, and then run them, but that did not work. So instead, I build the tests (but don’t run them), and then manually run them with sudo. This is implemented in the Justfile test target.

The integration tests typically need some btrfs filesystem to work with. So I came up with some helper structs that would create file-backed devices, format them, create loopback devices and then mount them, with Drop implementations that would clean them up. Here’s what that looks like:

/// fs_info should reflect a newly added device.
#[test]
#[ignore = "requires elevated privileges"]
fn filesystem_info_after_add() {
    let td = tempfile::tempdir().unwrap();
    let f1 = BackingFile::new(td.path(), "d1.img", 300_000_000);
    f1.mkfs();
    let lo1 = LoopbackDevice::new(f1);
    let mnt = Mount::new(lo1, td.path());

    let info1 = fs_info(mnt.fd()).expect("fs_info failed");
    assert_eq!(info1.num_devices, 1);
    assert_eq!(info1.max_id, 1);

    let f2 = BackingFile::new(td.path(), "d2.img", 300_000_000);
    let lo2 = LoopbackDevice::new(f2);
    let dev2_cpath = CString::new(lo2.path().to_str().unwrap()).unwrap();
    device_add(mnt.fd(), &dev2_cpath).expect("device_add failed");

    let info2 = fs_info(mnt.fd()).expect("fs_info after add failed");
    assert_eq!(info2.num_devices, 2);
    assert_eq!(info2.max_id, 2);
    assert_eq!(
        info1.uuid, info2.uuid,
        "uuid should not change after adding a device"
    );
}

You may notice that this test has an ignore attribute set. This is so that when you run cargo test, it is skipped (because regular cargo test is typically not run as root, so it would fail). When you run the test using just test, it will pass --ignored --num-threads=1 to enable those tests (the num-threads is because when you do a lot of filesystem operations simultaneously, there can be flakiness).

With these helper structs, the testing code is actually reasonably easy to follow and understand. If a test panics, Rust still ensures that the Drop handlers get executed, so that you don’t have a bunch of stray filesystems mounted.

I haven’t yet written tests for the CLI crate. I felt like testing the btrfs-uapi crate has a higher priority, but some of the helpers here can be repurposed when doing that.

Conclusion #

I was able to rewrite a substantial portion of btrfs-progs in Rust over the course of a few days. There are only a few subcommands that I did not get around to implementing, either because they are too complex, or because they require parsing an unmounted, raw btrfs filesystem. Much of that success is thanks to excellent supporting crates (bindgen, nix, clap, bitflags), and much of the mechanical work was offloaded to an LLM.

I mostly used Claude Sonnet 4.6 for the LLM-related things. In doing so, I learned a lot. For example, having long-running sessions can cause the context to balloon; I found that it is better to have multiple quick sessions. I also found that using Claude Haiku 4.5 made a lot more silly mistakes. My main takeaway is that these models can get things working, but it is very important to review the code they produce.

The next steps are to write tests for the CLI crate, to implement the few missing subcommands, and to write a crate that is able to parse and write to unmounted, raw btrfs filesystems.

You can try it #

The source code is available at github.com/rustutils/btrfs-progrs. You can check it out, contributions are welcome.

If you just want to use it, you can install it with cargo:

cargo install btrfs-cli --locked

Please note that for the time being, this is considered an experimental release, and there will likely be some bugs that need to be fixed before you should trust important data with it. If you do find issues, feel free to raise them.

If you want to interact with btrfs filesystems from code, you can also use the btrfs-uapi crate directly.

Reading #

Daniel Xu made a five-part blog post series about btrfs internals, which is an interesting read if you are interested in how btrfs works on-disk.

Some other noteworthy reads: