From prime step to OCI layer ============================ Rockcraft is a tool that creates OCI images using the same concepts and mechanisms that create snaps and charms: the lifecycle language from Craft Parts. There is a significant difference between the way the Craft lifecycle works and the OCI specification, and one of Rockcraft's jobs is to bridge the gap between these two worlds. This page describes how this is accomplished. .. note:: It is not necessary to know these details to use the tool effectively, but they might illuminate some concepts and help understand *why* the contents of a given rock are the way they are. Consider the following snippet of a ``rockcraft.yaml`` that creates a rock containing a bare-bones Python 3.10 interpreter: .. code-block:: yaml # (...) base: ubuntu@22.04 parts: python-part: plugin: nil stage-packages: - python3-minimal This rock has Ubuntu 22.04 as its base and includes ``python3-minimal``. Conceptually, this means that at build time Craft Parts will pull in the ``python3-minimal`` Ubuntu package and whatever dependencies it needs to work. Indeed, if we run ``rockcraft prime --shell-after``, we can see the final contents ready to be packed in the prime directory - this is the directory available at build-time through the ``${CRAFT_PRIME}`` environment variable: .. code-block:: bash $ rockcraft prime --shell-after $ cd ../prime $ ls bin etc lib lib64 sbin usr var $ ls usr/bin/ debconf debconf-copydb debconf-show dpkg-divert dpkg-realpath dpkg-trigger py3clean python3 debconf-apt-progress debconf-escape dpkg dpkg-maintscript-helper dpkg-split perl py3compile python3.10 debconf-communicate debconf-set-selections dpkg-deb dpkg-query dpkg-statoverride perl5.34.0 py3versions update-alternatives As we can see, the prime directory has the contents of the ``python3-minimal`` package but also many of its dependencies, direct and otherwise. Once the lifecycle is finished, Rockcraft packs the contents of the prime directory as a new OCI layer, directly as if the prime directory were the filesystem root ``/``. .. note:: The following sections only apply to rocks with Ubuntu bases - ``bare`` rocks don't need prime pruning nor ``usrmerge`` handling. Pruning the ``prime`` directory ------------------------------- One consequence of the inclusion of a ``stage-package``'s dependencies is that the prime directory ends up having many files that the base Ubuntu layer already has. This can be seen, for example, by using a tool like `Dive`_: .. figure:: /_static/dive-efficiency.png :width: 75% :align: center :alt: Dive reporting an inefficient image What ``dive`` tells us is that about ``60 MB`` worth of files are *duplicated* between the base Ubuntu 22.04 layer and the "primed" layer: for example, the file ``/usr/lib/x86_64-linux-gnu/libcrypto.so.3`` exists both in the base layer (as part of the base Ubuntu system) and in the primed layer (pulled in by belonging to a package that is an indirect dependency of ``python3-minimal``). Starting from version ``1.1.0``, Rockcraft "prunes" those files in the prime directory that also exist, with the same contents, ownership and permissions, in the base layer. The end result is semantically the same, because the layers are "stacked" together when creating containers from the rock. This "pruning" can be seen in the logs generated by Rockcraft: .. code-block:: text (...) Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/Sc/Gran.pl as it exists on the base Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/Bc/EN.pl as it exists on the base Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/PatSyn/Y.pl as it exists on the base Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/Dt/Init.pl as it exists on the base Pruning: /root/prime/usr/share/perl5/Debconf/Element/Noninteractive/Multiselect.pm as it exists on the base (...) ``usrmerge`` and the lifecycle layer ------------------------------------ After pruning, the contents of the prime directory are packed as a new OCI layer. In concrete terms, this means that the files and directories are added to a `tar archive`_, which means that each file (or directory) gets added to the archive together with the "destination" path that it should have when the archive is extracted. In most cases, the file's original path (relative to the root of the archive) and its destination path once extracted are the same, so the file that exists in the prime directory as ``a/b/c/file.txt`` should be extracted as ``a/b/c/file.txt``. However, there are cases where this "destination" path should be changed. For example, consider again the contents of the previous rock's prime directory: .. code-block:: bash $ ls -l total 5 drwxr-xr-x 2 root root 3 Dec 7 20:30 bin drwxr-xr-x 9 root root 10 Dec 7 20:30 etc drwxr-xr-x 4 root root 4 Dec 7 20:30 lib drwxr-xr-x 2 root root 2 Dec 7 20:30 lib64 drwxr-xr-x 2 root root 2 Dec 7 20:30 sbin drwxr-xr-x 7 root root 7 Dec 7 20:30 usr drwxr-xr-x 4 root root 4 Dec 7 20:30 var $ ls bin/ pebble So ``bin/`` is a regular directory and contains the ``pebble`` binary, to serve as the rock's entrypoint. However, consider the base directory structure of an Ubuntu system: .. code-block:: bash $ ls -l / total 84 lrwxrwxrwx 1 root root 7 ago 27 2022 bin -> usr/bin drwxr-xr-x 5 root root 4096 nov 27 13:59 boot drwxrwxr-x 2 root root 4096 ago 27 2022 cdrom drwxr-xr-x 20 root root 5900 dez 7 19:57 dev drwxr-xr-x 148 root root 12288 dez 7 15:15 etc drwxr-xr-x 3 root root 4096 ago 27 2022 home lrwxrwxrwx 1 root root 7 ago 27 2022 lib -> usr/lib lrwxrwxrwx 1 root root 9 ago 27 2022 lib32 -> usr/lib32 lrwxrwxrwx 1 root root 9 ago 27 2022 lib64 -> usr/lib64 lrwxrwxrwx 1 root root 10 ago 27 2022 libx32 -> usr/libx32 ``bin`` is actually a symbolic link to ``usr/bin``. This is the usrmerge_, and it's been present in Ubuntu for many years now. Note that many other entries are also symlinks, like ``lib`` (to ``usr/lib``) and ``lib64`` (to ``usr/lib64``). These two filesystems interact in a surprising way when stacked as OCI layers. If ``bin/pebble`` is added to the layer's archive as ``bin/pebble`` plus an entry for the ``bin/`` directory (which is a regular directory in the prime contents), once the two layers are stacked together in a container the ``bin/`` directory from the "prime layer" will *overwrite* the ``bin -> usr/bin`` symlink from the "base layer", which will make everything that assumed that the base binaries from ``usr/bin/`` would always be accessible through ``bin/`` break. This issue is made much worse if the instead of breaking ``bin/`` we break the ``lib*/`` symlinks. Consider: .. code-block:: bash $ ldd /bin/bash linux-vdso.so.1 (0x00007ffdf2af4000) libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f6053cbd000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6053a00000) /lib64/ld-linux-x86-64.so.2 (0x00007f6053e6b000) The ``bash`` binary links to multiple dynamic libraries, but has a hardcoded path to the ``/lib64/ld-linux-x86-64.so.2`` dynamic loader. This loader is the program that does the actual finding of dynamic dependencies at runtime, and in an Ubuntu system its actual location is at ``/usr/lib64/ld-linux-x86-64.so.2``. So if the ``/lib64 -> usr/lib64`` symlink is broken because the prime directory contains ``lib64`` as a regular directory, then the vast majority of the binaries in the final rock's base system will simply fail to run because their loader is no longer available at ``/lib64/ld-linux-x86-64.so.2``. To fix this, Rockcraft will take the base system into account when creating the archive for the prime layer. For instance, when considering ``bin/pebble``, Rockcraft will: #. Skip adding ``bin/`` as a regular directory, to avoid breaking the base system, and #. Add ``bin/pebble`` as ``usr/bin/pebble`` in the layer archive. This can be seen in the logs: .. code-block:: text (...) Creating new layer (...) Skipping /root/prime/bin because it exists as a symlink on the lower layer (...) Adding to layer: /root/prime/bin/pebble as 'usr/bin/pebble' (...) Finally, as mentioned in the beginning none of this applies for rocks with ``bare`` bases, as there is no base system to contain duplicates that need to be pruned or symbolic links that need to be taken into account. .. _tar archive: https://github.com/opencontainers/image-spec/blob/main/layer.md .. _usrmerge: https://wiki.debian.org/UsrMerge .. _Dive: https://github.com/wagoodman/dive