This is the third of a multipart series that explains the basics of btrfs’s on-disk format.
At the end of this series, we’ll have a program that can print out the absolute path of every regular file in an unmounted btrfs filesystem image without external libraries or ioctl(2)
calls.
Example code is available here.
As explained in part 2, btrfs stores nearly everything on-disk in B-trees. And as promised, I’ll now describe the B-tree data format. First, an example tree:
Each node in a btrfs B-tree is prefixed with a header. The header records the node’s “level”. Level 0 means the node is a leaf node and stores a payload. Level > 0 means the node is an internal node and stores pointers to children nodes. The header also stores the number of “items” the node contains where an “item” is either a pointer to child node if level > 0
, else, information on where to find the payload in the node. Recall that each item in a node is sorted by the associated BtrfsKey
which allows for efficient binary searches. There’s also some other data but it’s not too important to us.
In our example, root
and node 0
contain BtrfsKeyPtr
s because they’re not leaf nodes. leaf 0
and leaf 1
contain BtrfsItem
s because they are leaf nodes.
Now that we understand how trees are laid out on disk, let’s process the rest of the chunk tree.
First, let’s define the necessary structures:
#[repr(C, packed)]
#[derive(Copy, Clone)]
pub struct BtrfsHeader {
pub csum: [u8; BTRFS_CSUM_SIZE],
pub fsid: [u8; BTRFS_FSID_SIZE],
pub bytenr: u64,
pub flags: u64,
pub chunk_tree_uuid: [u8; BTRFS_UUID_SIZE],
pub generation: u64,
pub owner: u64,
pub nritems: u32,
pub level: u8,
}
#[repr(C, packed)]
#[derive(Copy, Clone)]
pub struct BtrfsKeyPtr {
pub key: BtrfsKey,
pub blockptr: u64,
pub generation: u64,
}
#[repr(C, packed)]
#[derive(Copy, Clone)]
pub struct BtrfsItem {
pub key: BtrfsKey,
pub offset: u32,
pub size: u32,
}
Note that BtrfsItem::offset
is the offset from the end of the associated BtrfsHeader
that we can find the payload for the BtrfsItem
.
Although not strictly necessary, we also define BtrfsNode
and BtrfsLeaf
as the following:
#[repr(C, packed)]
#[derive(Copy, Clone)]
pub struct BtrfsNode {
pub header: BtrfsHeader,
// `BtrfsKeyPtr`s begin here
}
#[repr(C, packed)]
#[derive(Copy, Clone)]
pub struct BtrfsLeaf {
pub header: BtrfsHeader,
// `BtrfsItem`s begin here
}
We don’t need these structure definitions because all it tells us is that every node in the on-disk B-tree starts with BtrfsHeader
. After parsing the header and reading BtrfsHeader::level
, we can infer what follows the header.
To walk any tree, we need to start at the root node. The superblock contains the logical offset the chunk tree root lives at. To read it:
fn read_chunk_tree_root(
: &File,
file: u64,
chunk_root_logical: &ChunkTreeCache,
cache-> Result<Vec<u8>> {
) let size = cache
.mapping_kv(chunk_root_logical)
.ok_or_else(|| anyhow!("Chunk tree root not bootstrapped"))?
.0
.size;
let physical = cache
.offset(chunk_root_logical)
.ok_or_else(|| anyhow!("Chunk tree root not bootstrapped"))?;
let mut root = vec![0; size as usize];
.read_exact_at(&mut root, physical)?;
file
Ok(root)
}
where chunk_root_logical
is BtrfsSuperblock::chunk_root
.
Walking the actual tree looks like a traditional recursive tree-walking algorithm:
fn read_chunk_tree(
: &File,
file: &[u8],
root: &mut ChunkTreeCache,
chunk_tree_cache: &BtrfsSuperblock,
superblock-> Result<()> {
) let header = tree::parse_btrfs_header(root)
.expect("failed to parse chunk root header");
tree::parse_btrfs_header
is a simple helper function that extracts the BtrfsHeader
out of root
and returns a reference to the header.
// Level 0 is leaf node, !0 is internal node
if header.level == 0 {
let items = tree::parse_btrfs_leaf(root)?;
If we’re at level 0, we know we’re looking at a leaf node. So we use tree::parse_btrfs_leaf
to extract the BtrfsItem
s.
for item in items {
if item.key.ty != BTRFS_CHUNK_ITEM_KEY {
continue;
}
We skip anything that isn’t a chunk item. The chunk tree also contains BTRFS_DEV_ITEM_KEY
s which help map physical offsets to logical offsets. However, we only need chunk items for our purpose so we skip everything else.
let chunk = unsafe {
// `item.offset` is offset from data portion of `BtrfsLeaf` where
// associated `BtrfsChunk` starts
&*(root
.as_ptr()
.add(std::mem::size_of::<BtrfsHeader>() + item.offset as usize)
as *const BtrfsChunk)
};
As mentioned earlier, BtrfsItem::offset
is the offset from the end of the BtrfsHeader
. The above code does the proper math to pull out the BtrfsChunk
associated with the current item
.
.insert(
chunk_tree_cache{
ChunkTreeKey : item.key.offset,
start: chunk.length,
size},
{
ChunkTreeValue : chunk.stripe.offset,
offset},
;
)}
Finally, we add the chunk entry into our chunk tree cache.
} else {
let ptrs = tree::parse_btrfs_node(root)?;
for ptr in ptrs {
let physical = chunk_tree_cache
.offset(ptr.blockptr)
.ok_or_else(|| anyhow!("Chunk tree node not mapped"))?;
let mut node = vec![0; superblock.node_size as usize];
.read_exact_at(&mut node, physical)?;
file, &node, chunk_tree_cache, superblock)?;
read_chunk_tree(file}
}
If we see level != 0
, we know we’re looking at an internal node. So we use the tree::parse_btrfs_node
helper to parse an internal node. Once we have the BtrfsKeyPtr
s, we read the node the key points to and recursively call read_chunk_tree
.
Ok(())
}
If we haven’t errored out by the end, it means we successfully walked the chunk tree.
Now that we’ve loaded the entire chunk tree into our cache, we can move onto walking the trees that contain the information we actually care about. In the next post, we’ll extract the filesystem tree root from the root tree root.