From prime step to OCI layer

Rockcraft is a tool that creates OCI images using the same concepts and mechanisms that create snaps and charms: the lifecycle language from Craft Parts. There is a significant difference between the way the Craft lifecycle works and the OCI specification, and one of Rockcraft’s jobs is to bridge the gap between these two worlds. This page describes how this is accomplished.

Note

It is not necessary to know these details to use the tool effectively, but they might illuminate some concepts and help understand why the contents of a given rock are the way they are.

Consider the following snippet of a rockcraft.yaml that creates a rock containing a bare-bones Python 3.10 interpreter:

# (...)
base: [email protected]

parts:
  python-part:
    plugin: nil
    stage-packages:
      - python3-minimal

This rock has Ubuntu 22.04 as its base and includes python3-minimal. Conceptually, this means that at build time Craft Parts will pull in the python3-minimal Ubuntu package and whatever dependencies it needs to work. Indeed, if we run rockcraft prime --shell-after, we can see the final contents ready to be packed in the prime directory - this is the directory available at build-time through the ${CRAFT_PRIME} environment variable:

$ rockcraft prime --shell-after
$ cd ../prime
$ ls
bin  etc  lib  lib64  sbin  usr  var
$ ls usr/bin/
debconf               debconf-copydb          debconf-show  dpkg-divert              dpkg-realpath      dpkg-trigger  py3clean     python3
debconf-apt-progress  debconf-escape          dpkg          dpkg-maintscript-helper  dpkg-split         perl          py3compile   python3.10
debconf-communicate   debconf-set-selections  dpkg-deb      dpkg-query               dpkg-statoverride  perl5.34.0    py3versions  update-alternatives

As we can see, the prime directory has the contents of the python3-minimal package but also many of its dependencies, direct and otherwise. Once the lifecycle is finished, Rockcraft packs the contents of the prime directory as a new OCI layer, directly as if the prime directory were the filesystem root /.

Note

The following sections only apply to rocks with Ubuntu bases - bare rocks don’t need prime pruning nor usrmerge handling.

Pruning the prime directory

One consequence of the inclusion of a stage-package’s dependencies is that the prime directory ends up having many files that the base Ubuntu layer already has. This can be seen, for example, by using a tool like Dive:

Dive reporting an inefficient image

What dive tells us is that about 60 MB worth of files are duplicated between the base Ubuntu 22.04 layer and the “primed” layer: for example, the file /usr/lib/x86_64-linux-gnu/libcrypto.so.3 exists both in the base layer (as part of the base Ubuntu system) and in the primed layer (pulled in by belonging to a package that is an indirect dependency of python3-minimal).

Starting from version 1.1.0, Rockcraft “prunes” those files in the prime directory that also exist, with the same contents, ownership and permissions, in the base layer. The end result is semantically the same, because the layers are “stacked” together when creating containers from the rock. This “pruning” can be seen in the logs generated by Rockcraft:

(...)
Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/Sc/Gran.pl as it exists on the base
Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/Bc/EN.pl as it exists on the base
Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/PatSyn/Y.pl as it exists on the base
Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/Dt/Init.pl as it exists on the base
Pruning: /root/prime/usr/share/perl5/Debconf/Element/Noninteractive/Multiselect.pm as it exists on the base
(...)

usrmerge and the lifecycle layer

After pruning, the contents of the prime directory are packed as a new OCI layer. In concrete terms, this means that the files and directories are added to a tar archive, which means that each file (or directory) gets added to the archive together with the “destination” path that it should have when the archive is extracted.

In most cases, the file’s original path (relative to the root of the archive) and its destination path once extracted are the same, so the file that exists in the prime directory as a/b/c/file.txt should be extracted as a/b/c/file.txt.

However, there are cases where this “destination” path should be changed. For example, consider again the contents of the previous rock’s prime directory:

$ ls -l
total 5
drwxr-xr-x 2 root root  3 Dec  7 20:30 bin
drwxr-xr-x 9 root root 10 Dec  7 20:30 etc
drwxr-xr-x 4 root root  4 Dec  7 20:30 lib
drwxr-xr-x 2 root root  2 Dec  7 20:30 lib64
drwxr-xr-x 2 root root  2 Dec  7 20:30 sbin
drwxr-xr-x 7 root root  7 Dec  7 20:30 usr
drwxr-xr-x 4 root root  4 Dec  7 20:30 var
$ ls bin/
pebble

So bin/ is a regular directory and contains the pebble binary, to serve as the rock’s entrypoint. However, consider the base directory structure of an Ubuntu system:

$ ls -l /
total 84
lrwxrwxrwx   1 root root     7 ago 27  2022 bin -> usr/bin
drwxr-xr-x   5 root root  4096 nov 27 13:59 boot
drwxrwxr-x   2 root root  4096 ago 27  2022 cdrom
drwxr-xr-x  20 root root  5900 dez  7 19:57 dev
drwxr-xr-x 148 root root 12288 dez  7 15:15 etc
drwxr-xr-x   3 root root  4096 ago 27  2022 home
lrwxrwxrwx   1 root root     7 ago 27  2022 lib -> usr/lib
lrwxrwxrwx   1 root root     9 ago 27  2022 lib32 -> usr/lib32
lrwxrwxrwx   1 root root     9 ago 27  2022 lib64 -> usr/lib64
lrwxrwxrwx   1 root root    10 ago 27  2022 libx32 -> usr/libx32

bin is actually a symbolic link to usr/bin. This is the usrmerge, and it’s been present in Ubuntu for many years now. Note that many other entries are also symlinks, like lib (to usr/lib) and lib64 (to usr/lib64).

These two filesystems interact in a surprising way when stacked as OCI layers. If bin/pebble is added to the layer’s archive as bin/pebble plus an entry for the bin/ directory (which is a regular directory in the prime contents), once the two layers are stacked together in a container the bin/ directory from the “prime layer” will overwrite the bin -> usr/bin symlink from the “base layer”, which will make everything that assumed that the base binaries from usr/bin/ would always be accessible through bin/ break.

This issue is made much worse if the instead of breaking bin/ we break the lib*/ symlinks. Consider:

$ ldd /bin/bash
linux-vdso.so.1 (0x00007ffdf2af4000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f6053cbd000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6053a00000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6053e6b000)

The bash binary links to multiple dynamic libraries, but has a hardcoded path to the /lib64/ld-linux-x86-64.so.2 dynamic loader. This loader is the program that does the actual finding of dynamic dependencies at runtime, and in an Ubuntu system its actual location is at /usr/lib64/ld-linux-x86-64.so.2. So if the /lib64 -> usr/lib64 symlink is broken because the prime directory contains lib64 as a regular directory, then the vast majority of the binaries in the final rock’s base system will simply fail to run because their loader is no longer available at /lib64/ld-linux-x86-64.so.2.

To fix this, Rockcraft will take the base system into account when creating the archive for the prime layer. For instance, when considering bin/pebble, Rockcraft will:

  1. Skip adding bin/ as a regular directory, to avoid breaking the base system, and

  2. Add bin/pebble as usr/bin/pebble in the layer archive.

This can be seen in the logs:

(...)
Creating new layer
(...)
Skipping /root/prime/bin because it exists as a symlink on the lower layer
(...)
Adding to layer: /root/prime/bin/pebble as 'usr/bin/pebble'
(...)

Finally, as mentioned in the beginning none of this applies for rocks with bare bases, as there is no base system to contain duplicates that need to be pruned or symbolic links that need to be taken into account.