FontCraft: Multimodal Font Design Using Interactive Bayesian Optimization

1The University of Tokyo, Japan 2Adobe Research Basel, Switzerland 3Centre Inria d’Université Côte d’AzurSophia Antipolis, France 4National Institute of Advanced Industrial Science and Technology (AIST), Japan 5Reichman University, Israel
FontCraft teaser image.

Abstract

Creating new fonts requires a lot of human effort and professional typographic knowledge. Despite the rapid advancements of automatic font generation models, existing methods require users to prepare pre-designed characters with target styles using font-editing software, which poses a problem for non-expert users.

To address this limitation, we propose FontCraft, a system that enables font generation without relying on pre-designed characters. Our approach integrates the exploration of a font-style latent space with human-in-the-loop preferential Bayesian optimization and multimodal references, facilitating efficient exploration and enhancing user control. Moreover, FontCraft allows users to revisit previous designs, retracting their earlier choices in the preferential Bayesian optimization process. Once users finish editing the style of a selected character, they can propagate it to the remaining characters and further refine them as needed. The system then generates a complete outline font in OpenType format.

We evaluated the effectiveness of FontCraft through a user study comparing it to a baseline interface. Results from both quantitative and qualitative evaluations demonstrate that FontCraft enables non-expert users to design fonts efficiently.

Video

Key points

Explore one-dimensional subspace

Users can design a font by exploring one-dimensional subspace of a font style latent space.

Interpolate start reference image.

Edge

Loading...
Interpolation end reference image.

Edge

Search subspace construction

One-dimensional search subspace is constructed by the preferential Bayesian optimization or multimodal references.

Search subspace construction.

Multimodal references

Users can use multimodal references to guide the exploration. They are handled by FontCLIP and embedded into the latent space.

Multimodal references.

Retractable preference modeling

Our system allows users to revisit previous designs, retracting their earlier choices in the preferential Bayesian optimization process. This feature frees users from the limitations of forward-direction design workflow.

Retractable preference modeling.

BibTeX

@article{tatsukawa2025fontcraft,
  author    = {Tatsukawa, Yuki and Shen, I-Chao and Dogan, Mustafa Doga and Qi, Anran and Koyama, Yuki and Shamir, Ariel Igarashi, Takeo},
  title     = {FontCraft: Multimodal Font Design Using Interactive Bayesian Optimization},
  journal   = {CHI},
  year      = {2025},
}