Run QEMU

Introduction

I think there are many software engineers who use macOS as their main PC for various reasons and they want a virtual Linux environment when they want to learn Linux and system programming. This article introduces how to create a virtual Linux environment with QEMU. Additionally, I’ll explain it from the perspective of the boot process since I’ve learned it from troubleshooting these attempts.

Preparations

QEMU is software that can emulate the whole machine. QEMU is called a hypervisor. I mainly use macOS but I use QEMU to do some experiments with Linux. I use QEMU, not Docker containers, because QEMU is suitable for low-layer experiments since it virtualizes the machine including the kernel.

What we want to do

(1) Starting up a Linux environment

(2) Installing and persisting Rust and other tools initially. They should be available for use after the first boot.

(3) Mounting directories from the host OS since I want to edit files on the host OS.

How to run and set up

Running command

You can start it with the following command.

qemu-system-aarch64 \
  -machine virt,accel=hvf \
  -cpu host \
  -m 4G \
  -smp 4 \
  -kernel vm/Image \
  -initrd vm/initramfs \
  -drive file=vm/rootfs.img,if=virtio,format=raw \
  -netdev user,id=net0,hostfwd=tcp::2222-:22 \
  -device virtio-net-device,netdev=net0 \
  -virtfs local,path=./shared,mount_tag=shared,security_model=none,id=shared \
  -nographic \
  -append "console=ttyAMA0 alpine_repo=http://dl-cdn.alpinelinux.org/alpine/v3.19/main/ modloop=http://dl-cdn.alpinelinux.org/alpine/v3.19/releases/aarch64/netboot-3.19.0/modloop-virt"

The explanations generated by LLM for the options are as follows.

qemu-system-aarch64: QEMU binary for ARM64 (aarch64). Used when running ARM Linux on Apple Silicon Mac.

-machine virt,accel=hvf: Specifies the machine type and acceleration method for the virtual machine.
- virt: Generic ARM virtual machine board provided by QEMU. Uses a virtual standard board rather than a specific SoC of a physical machine.
- accel=hvf: Enables hardware virtualization using macOS's Hypervisor.framework (HVF). Significantly faster than software emulation.

-cpu host: Configuration that makes the CPU visible to the guest as close as possible to the host CPU. By exposing Apple Silicon's CPU features directly to the guest, it achieves both compatibility and performance.

-m 4G: Allocates 4GB of RAM to the virtual machine.

-smp 4: Sets the virtual CPU to 4 cores. Configuration to increase parallelism for builds and heavy processing.

-kernel vm/Image: Path to the Linux kernel image used for booting. QEMU directly loads this file into memory without going through a bootloader.

-initrd vm/initramfs: Path to the initramfs image. Initial RAM disk containing tools and drivers needed until the root filesystem is mounted.

-drive file=vm/rootfs.img,if=virtio,format=raw: Configures a virtual disk drive.
- file=vm/rootfs.img: Root filesystem image with Alpine Linux installed.
- if=virtio: Connects as a VirtIO block device. Faster than standard SATA emulation.
- format=raw: Indicates that the image format is raw (raw disk image).

-netdev user,id=net0,hostfwd=tcp::2222-:22: Configuration for user-mode networking and port forwarding.
- user: Uses the host's user-mode network stack. Allows access to external networks without additional bridge configuration.
- id=net0: Assigns the identifier "net0" to this network backend.
- hostfwd=tcp::2222-:22: Forwards the host's TCP port 2222 to the guest's port 22 (SSH). You can connect from the host side using ssh -p 2222 root@localhost.

-device virtio-net-device,netdev=net0: Adds a VirtIO network device to the guest and connects it to the previously defined net0. VirtIO NIC has less overhead than emulation and provides fast network I/O.

-virtfs local,path=./shared,mount_tag=shared,security_model=none,id=shared: Shares the host directory ./shared with the guest using 9p (VirtFS).
- path=./shared: Directory to share from the host side.
- mount_tag=shared: Tag name specified when mounting on the guest side. Example: mount -t 9p -o trans=virtio shared /mnt/shared
- security_model=none: Mode that allows direct access without ID translation. Convenient for experimental purposes, but be careful with permissions.
- id=shared: Identifier for this VirtFS device.

-nographic: Does not open a graphical console window and uses standard input/output (terminal) directly as the guest's serial console. Works well for server-like usage and SSH-based environments.

-append "console=ttyAMA0 alpine_repo=... modloop=...": Specifies kernel command-line arguments together.
- console=ttyAMA0: Sets the kernel's console output destination to ARM's serial port ttyAMA0. When combined with -nographic, boot logs will appear in the terminal.
- alpine_repo=...: Repository URL from which Alpine Linux retrieves packages. The netboot image looks at this URL to fetch necessary packages.
- modloop=...: Location from which Alpine retrieves the squashfs image of kernel modules. Modules are loaded over the network from the path specified here.

The overview is as follows.

It specifies locally built images for kernel and initrd. With this boot method, it can skip the necessary bootloader execution required for a real boot and start from loading the kernel. This method aims to run a simple virtual machine.

The kernel extracts the initramfs onto the rootfs. After the initramfs is extracted, the kernel builds a rootfs-like environment on a temporary tmpfs. initramfs is a cpio archive format image containing files like /init, and /init performs necessary tasks such as network configuration and searching/mounting the actual root device.

** ramfs-rootfs-initramfs.txt (rootfs refers to the root filesystem, but here it’s about before the main root filesystem is mounted, where ramfs or tmpfs is used (as rootfs). It’s confusing.)

Note that in this boot method, the actual root filesystem is also built in memory. 4GiB of memory is allocated. Since RAM serves as the backend for the root filesystem, attempting to download large files will immediately result in a “No space left on device” error.

The -append option specifies kernel parameters. alpine_repo is the repository URL from which Alpine Linux retrieves packages. modloop specifies the URL for downloading module groups. initramfs contains only minimal components, so common filesystem drivers like ext4 are not included. Therefore, modloop is used to make kernel modules available as /lib/modules. This enables commands like modprobe ext4.

The -drive option specifies a disk image. Although it’s named rootfs.img, it does not become the root filesystem; it’s just a block device image containing ext4. This is later mounted at /data for persistence. (The naming reflects an attempt to persist the root filesystem, as I didn’t realize that this method is designed to build the root filesystem in memory.)

Directory sharing with the host

For directory sharing with the host, it seems that mounting is possible using 9pfs. Since we have an environment where modprobe can be used, loading modules was straightforward.

Initial setup after boot

The initial procedure after booting with the command above is as follows. I’ve confirmed it works up to the point where Rust can be executed.

# 1. Mount shared folder
mkdir -p /mnt/shared
modprobe 9p 9pnet 9pnet_virtio
mount -t 9p -o trans=virtio shared /mnt/shared

# 2. Mount persistence disk
mkdir -p /data
modprobe ext4
mount /dev/vda3 /data

# 3. Rust environment setup (recommended to add to ~/.profile)
export CARGO_HOME=/data/cargo
export RUSTUP_HOME=/data/rustup
export TMPDIR=/data/tmp
export PATH="/data/cargo/bin:$PATH"

# 4. First time only: Install Rust
if [ ! -f /data/cargo/bin/cargo ]; then
  cd /data
  apk add build-base curl
  curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --no-modify-path
  # Reset environment variables
  export CARGO_HOME=/data/cargo
  export RUSTUP_HOME=/data/rustup
  export TMPDIR=/data/tmp
  export PATH="/data/cargo/bin:$PATH"
fi

# 5. Copy project to shared folder (first time only)
# Execute on host side: make sync-to-shared

# 6. Run experiments
source ~/.profile
cd /mnt/shared/project
cargo test -- --nocapture

Note that the following commands also need to be executed on subsequent boots. This is because these settings are built on the RAM filesystem and are deleted when the system stops.

# Mount
mkdir -p /mnt/shared /data
modprobe 9p 9pnet 9pnet_virtio && mount -t 9p -o trans=virtio shared /mnt/shared
modprobe ext4 && mount /dev/vda3 /data

# Set environment variables
export CARGO_HOME=/data/cargo
export RUSTUP_HOME=/data/rustup
export PATH="/data/cargo/bin:$PATH"

Additional knowledge

What is a serial console?

In this configuration, -nographic and console=ttyAMA0 are specified. This emulates on the terminal what is essentially the same as “inserting a serial cable into a physical machine and viewing logs on a serial console.” In the notebook PCs we usually use, there’s a monitor and keyboard, and “having a display is taken for granted,” but embedded devices and old servers may not have a display. In such cases, the serial console becomes the interface for logging in via a serial port. This is also a feature that can be used in AWS EC2 instances when errors occur during startup, for example.

In QEMU, adding -nographic prevents a graphical window from opening and treats standard input/output as a serial console. Furthermore, adding console=ttyAMA0 allows “directing the kernel’s console output to a virtual serial port.” As a result, everything from boot messages to the login prompt flows to the terminal.

Alpine Linux /init boot sequence

In Alpine Linux’s netboot image, after the kernel starts, initramfs is first extracted, and then the /init script within it is executed. The rough flow is as follows:

  1. The kernel extracts the initramfs image onto RAM
  2. /init on RAM is executed
  3. Within /init, it reads alpine_repo= and modloop= from the kernel parameters, accesses the specified URLs via HTTP, and downloads the modloop image
  4. The downloaded modloop is loopback-mounted, making modules under /lib/modules/… available
  5. Once the necessary preparations are complete, the actual root filesystem is mounted, and control is transferred through a switch_root-like process

In this configuration, since the root= parameter is intentionally not specified, the “actual root filesystem” remains as tmpfs on RAM rather than a persistent disk. In other words, Alpine’s /init is in “do everything in RAM” mode, and we mount /data and other directories on top of it. alpine_repo and modloop are merely “entry points for fetching necessary modules over the network from the initramfs world,” and user-space applications themselves access the repository through apk when needed.

I feel this configuration has reasonably valid reasons. It’s a balanced approach: keeping the kernel and initramfs local to speed up booting while allowing modules and userland to be flexibly switched over the network.

What are modloop and modprobe commands?

The key points in this Alpine netboot configuration are modloop and modprobe.

In Alpine’s netboot image, a “collection of kernel modules” is provided as a squashfs image separately from the kernel itself, and this is called modloop. By downloading modloop somewhere and loopback-mounting it, the module groups under /lib/modules/… become visible.

modprobe is a command for loading kernel modules, finding necessary modules while resolving dependencies. In this case, we execute commands like modprobe ext4 and modprobe 9p 9pnet 9pnet_virtio, but all of these load modules provided via modloop. Without modloop, ext4 and 9p modules are not included in initramfs, so disks and shared folders cannot be mounted.

ISO boot

So far, we’ve used a “netboot-like” boot method that directly specifies the kernel and initramfs, but there is also booting from an ISO image. I myself have experience burning an ISO image to a DVD and booting Linux with it as a boot device, and this should be a similar experience. In QEMU, for example, the command would be as follows.

qemu-system-aarch64 \
  -machine virt,accel=hvf \
  -cpu host \
  -m 4G \
  -smp 4 \
  -drive file=vm/rootfs.img,if=virtio,format=raw \
  -cdrom alpine-virt-3.19.0-aarch64.iso \
  -nographic

In this case, QEMU starts the virtual firmware (equivalent to BIOS/UEFI), and that firmware finds and starts the bootloader (such as isolinux or grub) within the ISO image.

Closing Thought

I chose a method that skips the initial boot process and starts from loading the kernel for the reason that it boots faster. I might not have needed to choose Alpine. I think it was good that I was able to learn a bit about what initramfs is through this. I feel like it might be possible to customize initramfs according to the use case. I need to learn more about the boot process.

This article is written by K.Waki

Software Engineer. English Learner. Opinions expressed here are mine alone.