Introduction to Code Generation in Rust

November 2, 2023

This article is about generating Rust code from other Rust code, not for the code generation step of the rustc compiler. Another term for source code generation is metaprogramming, but it will be referred to as code generation here. The reader is expected to have some Rust knowledge.

What problems can it solve?

I want to ship a web frontend embedded inside a Rust binary to end users, such as a desktop application. Projects like Tauri achieve embedding with code generation by writing Rust code that generates more Rust code. Why does Tauri choose to use code generation over less complicated solutions? Let’s take a look at what that solution might look like.

Imagine the output of our web frontend looks like:

dist
├── assets
  ├── script-44b5bae5.js
  ├── style-48a8825f.css
├── index.html

Let’s embed these in our Rust project by using include_str!(), which adds the content of the specified file into the binary. That would look something like this:

use std::collections::HashMap;

fn main() {
    let mut assets = HashMap::new();

    assets.insert(
        "/index.html",
        include_str!("../dist/index.html")
    );

    assets.insert(
        "/assets/script-44b5bae5.js",
        include_str!("../dist/assets/script-44b5bae5.js")
    );

    assets.insert(
        "/assets/style-48a8825f.css",
        include_str!("../dist/assets/style-48a8825f.css")
    );
}

Straightforward enough, now we can grab those assets directly from the final binary! However, what if we don’t always know the assets’ filenames ahead of time? Let’s say we have worked more on our frontend project and now its output looks like:

dist
├── assets
 # script-44b5bae5.js previously
  ├── script-581f5c69.js

 # style-48a8825f.css previously
  ├── style-e49f12aa.css
├── index.html

Ah… the filenames of our assets have changed due to our frontend bundler utilizing cache busting. The Rust code no longer compiles until we fix the filenames inside of it. It would be a terrible developer experience if we had to update our Rust code every time we changed the frontend - imagine if we had dozens of assets! Tauri uses code generation to avoid this by finding the assets at compile time and generating Rust code which calls the correct assets.

Tools

Let’s talk about a few tools for code generation and then use them to implement our own simple asset bundler.

  • The quote crate enables us to write Rust code that gets transformed into data which then generates syntactically correct Rust code. This crate is ubiquitous across the Rust ecosystem for writing code generation.
  • The walkdir crate provides an easy way to recursively grab all items in a directory. This crate is highly applicable for our asset bundler use-case.
  • The phf crate implements a HashMap implementation using perfect hash functions. This is useful when all keys and values in the map are known before it’s built. This crate is highly applicable for our asset bundler use-case.

Rust code generation typically occurs in build scripts or macros. We will be building our simple asset bundler using build scripts because we will be accessing the disk. While procedural macros can also do that, it can be problematic in a few ways.

Building the Assets Bundler

The source code is available on GitHub if you want to see how everything is put together afterwards.

Create our library

Let’s start off with creating a new Rust library:

cargo new --lib asset-bundler
cd asset-bundler

We want to create a way for applications that use this library to grab the assets, so let’s create that first. This will involve us creating a wrapper around phf::Map and a method to let callers get the content.

cargo add phf --features macros

We don’t need too much functionality from our Assets struct, just a way to create it and a way to get at what’s inside of it. The following goes into src/lib.rs:

pub use phf; // re-export phf so we can use it later

type Map = phf::Map<&'static str, &'static str>;

/// Container for compile-time embedded assets.
pub struct Assets(Map);

impl From<Map> for Assets {
    fn from(value: Map) -> Self {
        Self(value)
    }
}

impl Assets {
    /// Get the contents of the specified asset path.
    pub fn get(&self, path: &str) -> Option<&str> {
        self.0.get(path).copied()
    }
}

Codegen

Now, we build the library that will be used in a build script to generate our code. Because we will be having multiple crates in the same repository, let’s quickly convert the project to a cargo workspace. Let’s add the following to the top of our Cargo.toml:

[workspace]
members = ["codegen"]

Now we are ready to continue creating our codegen library. Run these commands to create our project and grab our dependencies:

cargo new --lib codegen --name asset-bundler-codegen
cargo add quote walkdir --package asset-bundler-codegen

Time to think a bit of what functionality we need and boil it down into a few concrete steps.

  • We pass an assets path to our function, which we will call base.
  • We check if base exists, or else we can’t do anything.
  • Recursively gather all file paths inside base.
  • Generate code to embed all the file paths.

One last thing to mention, we want to get assets by passing in a relative path. We want assets.get("index.html"), not assets.get("../dist/index.html"). This means we will need to keep track of that base directory passed into our function. Let’s write those requirements down as code inside of codegen/src/lib.rs:

/// Generate Rust code to create an [`asset-bundler::Asset`] from the passed path.
pub fn codegen(path: &Path) -> std::io::Result<String> {
    // canonicalize also checks if the path exists
    // which is the only case that makes sense for us
    let base = path.canonicalize()?;

    let paths = gather_asset_paths(&base);
    Ok(generate_code(&paths, &base))
}

/// Recursively find all files in the passed directory.
fn gather_asset_paths(base: &Path) -> Vec<PathBuf> {
  todo!()
}

/// Generate Rust code to create an [`asset-bundler::Asset`].
fn generate_code(paths: &[PathBuf], base: &Path) -> String {
  todo!()
}

Let’s take on gather_assets_paths first, since it’s more specific to our project than codegen. We will use walkdir to recursively grab all the files from the passed base directory. This is a simple example project, so we will ignore errors for now by using flatten() which removes nested iterators. Because Result also implement’s IntoIterator, we are only left with successful values. Let’s implement it in codegen/src/lib.rs:

/// Recursively find all files in the passed directory.
fn gather_asset_paths(base: &Path) -> Vec<PathBuf> {
  let mut paths = Vec::new();
  for entry in WalkDir::new(base).into_iter().flatten() {
    // we only care about files, ignore directories
    if entry.file_type().is_file() {
      paths.push(entry.into_path())
    }
  }

  paths
}

Cool cool cool.

Now we have a list of all asset files that are supposed to be included in the binary. The second function will generate the actual Rust code, but let’s see what the code we are generating should look like. We need to make sure that:

  • We import the correct dependencies.
  • The phf::Map is created with all the values, we can use phf::phf_map! to help.
  • Our Assets struct from our first library is created.

The first point is pretty important, we need to make sure we are calling the correct library. We can prevent crate name collisions by using a leading :: on our use statement. Additionally, we need to make sure we have our re-exported phf, otherwise the end application will fail to compile if it itself doesn’t depend on phf.

Using the frontend example from above, this is how phf_map! should look like:

use ::asset_bundler::{Assets, phf::{self, phf_map}};

let map = phf_map! {
  "index.html" => include_str!("../dist/index.html"),
  "assets/script-44b5bae5.js" => include_str!("../dist/assets/script-44b5bae5.js"),
  "assets/style-48a8825f.css" => include_str!("../dist/assets/style-48a8825f.css")
};

let assets = Assets::from(map);

Our first problem comes from us only having the paths used in include_str!(), we don’t have the “key” paths. We also need to turn our paths into strings at some point, because that is how they are used in the generated code. Let’s first figure out how to transform our list of paths into a list of strings suitable for keys. We need to strip the base prefix we resolved earlier from all the paths, so let’s write that inside of codegen/src/lib.rs:

/// Turn paths into relative paths suitable for keys.
fn keys(paths: &[PathBuf], base: &Path) -> Vec<String> {
  let mut keys = Vec::new();

  for path in paths {
    // ignore this failure case for this example
    if let Ok(key) = path.strip_prefix(base) {
      keys.push(key.to_string_lossy().into())
    }
  }

  keys
}

The values of the map are easier. Their paths are already the ones [include_dir!()] need, so we just need to turn them into strings. Let’s write this one with an Iterator, which we also could have done with keys:

let values = paths.iter().map(|p| p.to_string_lossy());

So now we have both keys and values in usable formats. Next comes the macro part, where we will actually be generating code from all the data.

Let’s talk about how we are about to use double brackets. This is not something required when doing code generation, but in our case we want to use the resulting Assets anywhere. By using a block expression we can use it anywhere an expression is valid, which is lots of places.

Second, we are about to use some very unfamiliar syntax for those of you who have not written macros before. While it may seem strange at first, the syntax here is widely used across the ecosystem. In particular, we are going to be using the repetition syntax of quote. This allows us to use our two collections of keys and values together.

Let’s do it:

quote! {{
  use ::asset_bundler::{Assets, phf::{self, phf_map}};
  Assets::from(phf_map! {
    #( #keys => include_str!(#values) ),*
  })
}}

While the syntax is surely a departure from normal Rust code, hopefully you are able to recognize some familiar patterns we already went over. Here’s a side-by-side comparison to the phf_map! example we did before:

let keys = ["key1", "key2", "key3"];
let values = ["value1", "value2", "value3"];
quote! {
  phf_map! {
    #( #keys => include_str!(#values) ),*
  }
}

// turns into this
phf_map! {
  "key1" => include_str!("value1"),
  "key2" => include_str!("value2"),
  "key3" => include_str!("value3")
}

With all that out of the way, let’s plug that into our generate_code function we created earlier to see how it interacts with the rest of the code. Inside of codegen/src/lib.rs:

/// Generate Rust code to create an [`asset-bundler::Asset`].
fn generate_code(paths: &[PathBuf], base: &Path) -> String {
  let keys = keys(paths, base);
  let values = paths.iter().map(|p| p.to_string_lossy());

  // double brackets to make it a block expression
  let output = quote! {{
        use ::asset_bundler::{Assets, phf::{self, phf_map}};
        Assets::from(phf_map! {
            #( #keys => include_str!(#values) ),*
        })
    }};

  output.to_string()
}

/// Turn paths into relative paths suitable for keys
fn keys(paths: &[PathBuf], base: &Path) -> Vec<String> {
  let mut keys = Vec::new();

  for path in paths {
    // ignore this failure case for this example
    if let Ok(key) = path.strip_prefix(base) {
      keys.push(key.to_string_lossy().into())
    }
  }

  keys
}

Phew! That actually wraps up the codegen library. I’ll drop the full codegen/src/lib.rs here, and then we can skedaddle to actually using what we just worked on:

use quote::quote;
use std::path::{Path, PathBuf};
use walkdir::WalkDir;

/// Generate Rust code to create an [`asset-bundler::Asset`] from the passed path.
pub fn codegen(path: &Path) -> std::io::Result<String> {
  // canonicalize also checks if the path exists
  // which is the only case that makes sense for us
  let base = path.canonicalize()?;

  let paths = gather_asset_paths(&base);
  Ok(generate_code(&paths, &base))
}

/// Recursively find all files in the passed directory.
fn gather_asset_paths(base: &Path) -> Vec<PathBuf> {
  let mut paths = Vec::new();
  for entry in WalkDir::new(base).into_iter().flatten() {
    // we only care about files, ignore directories
    if entry.file_type().is_file() {
      paths.push(entry.into_path())
    }
  }

  paths
}

/// Generate Rust code to create an [`asset-bundler::Asset`].
fn generate_code(paths: &[PathBuf], base: &Path) -> String {
  let keys = keys(paths, base);
  let values = paths.iter().map(|p| p.to_string_lossy());

  // double brackets to make it a block expression
  let output = quote! {{
        use ::asset_bundler::{Assets, phf::{self, phf_map}};
        Assets::from(phf_map! {
            #( #keys => include_str!(#values) ),*
        })
    }};

  output.to_string()
}

/// Turn paths into relative paths suitable for keys.
fn keys(paths: &[PathBuf], base: &Path) -> Vec<String> {
  let mut keys = Vec::new();

  for path in paths {
    // ignore this failure case for this example
    if let Ok(key) = path.strip_prefix(base) {
      keys.push(key.to_string_lossy().into())
    }
  }

  keys
}

Using it

We just made a simple asset bundler in 50 lines of code, and it’s time to use it! We will start off with creating a new example project to consume the two libraries we just created.

First, add a new item to the root Cargo.toml:

[workspace]
members = ["codegen", "example"]

Then, we create the example binary and add our dependencies:

cargo new --bin example
cargo add asset-bundler --path . --package example
cargo add --build asset-bundler-codegen --path codegen --package example
touch example/build.rs
mkdir -p example/assets/scripts

Let’s start off the Rust code with the build script since we just created our codegen library. We will want to call the codegen function we created earlier to get the generated code. Now we can write this generated Rust code to somewhere our other code can use it. This is going into our example/build.rs:

use std::path::Path;

fn main() {
    let assets = Path::new("assets");
    let codegen = match asset_bundler_codegen::codegen(assets) {
        Ok(codegen) => codegen,
        Err(err) => panic!("failed to generate asset bundler codegen: {err}"),
    };

    let out = std::env::var("OUT_DIR").unwrap();
    let out = Path::new(&out).join("assets.rs");
    std::fs::write(out, codegen.as_bytes()).unwrap();
}

We ended up writing the code to $OUT_DIR/assets.rs because build scripts set $OUT_DIR to a unique directory for each crate, and new versions of the same crate. The path we just wrote to will be important in just a second, but first let’s create some assets to actually use.

We want to create some assets that are somewhat representative of the example we used at the start. In this case, let’s imagine that these assets are for a webserver and the files are served to the browser. This article isn’t the place for implementing the server, but we will mimic the index.html’s script dependencies by using what asset they require as their contents. Run these commands to create them:

echo -n "scripts/loader-a1b2c3.js" > example/assets/index.html
echo -n "scripts/dashboard-f0e9d8.js" > example/assets/scripts/loader-a1b2c3.js
echo -n "console.log('dashboard stuff')" > example/assets/scripts/dashboard-f0e9d8.js

It’s time to put it together and get a glimpse of how it works! We set up the examples so that there is only a single “always known” filename index.html. Our goal is to get the content of that dashboard script using only a index.html literal. Here we will jump to the each next asset in example/src/main.rs:

fn main() {
  // include the assets our build script created
  let assets = include!(concat!(env!("OUT_DIR"), "/assets.rs"));

  let index = assets.get("index.html").unwrap();
  let loader = assets.get(index).unwrap();
  let dashboard = assets.get(loader).unwrap();

  assert_eq!(dashboard, "console.log('dashboard stuff')");
}

Don’t forget, you can see all the code on GitHub.

That’s it!

A very bare-bones asset bundler in 94 lines of code, including the example. Treating code generation like any other Rust code is an important aspect to keeping it understandable and maintainable. In those 90 lines of code, there were only a handful of lines for doing actual code generation. Let’s break down what we did…

  • We created the asset-bundler crate that provides the Assets type and re-exported phf to ensure that our codegen crate could use it.
  • We created the asset-bundler-codegen crate to hold all the functionality codegen uses, along with providing a public function codegen to utilize it.
  • We created the example build script to call the codegen function on its own assets. The generated code was written to a file which we then included in our example/main.rs.

While having a separate crate isn’t necessary for specifically build script code generation, it is very common. Not only does it help separate concerns and prevent unused dependencies, it also helps prevent circular dependencies on more complex projects. Having a separate crate is required for performing code generation with procedural macros.

Code generation is a powerful tool to bring advanced functionality to your Rust programs. Our example from earlier, Tauri, uses it extensively to perform code injection, compression, and validation for its own asset bundling.

Demystify code generation by writing it as regular Rust code, empowering you to build powerful software.