IEEE VIS 2024 Content: Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Seongmin Lee - Georgia Tech, Atlanta, United States

Benjamin Hoover - GA Tech, Atlanta, United States. IBM Research AI, Cambridge, United States

Hendrik Strobelt - IBM Research AI, Cambridge, United States

Zijie J. Wang - Georgia Tech, Atlanta, United States

ShengYun Peng - Georgia Institute of Technology, Atlanta, United States

Austin P Wright - Georgia Institute of Technology , Atlanta , United States

Kevin Li - Georgia Institute of Technology, Atlanta, United States

Haekyu Park - Georgia Institute of Technology, Atlanta, United States

Haoyang Yang - Georgia Institute of Technology, Atlanta, United States

Duen Horng (Polo) Chau - Georgia Tech, Atlanta, United States

Screen-reader Accessible PDF

Room: Bayshore VI

2024-10-17T17:54:00ZGMT-0600Change your timezone on the schedule page
2024-10-17T17:54:00Z
Exemplar figure, described by caption below
With Diffusion Explainer, users can visually examine how text prompt (e.g., “a cute and adorable bunny... pixar character”) is encoded by the Text Representation Generator into vectors to guide the Image Representation Refiner to iteratively refine the vector representation of the image being generated. The Timestep Controller enables users to review the incremental improvements in image quality and adherence to the prompt over timesteps. Diffusion Explainer tightly integrates a visual overview of Stable Diffusion’s complex components with detailed explanations of their underlying operations, enabling users to fluidly transition between multiple levels of abstraction through animations and interactive elements.
Fast forward
Full Video
Keywords

Machine Learning, Statistics, Modelling, and Simulation Applications; Software Prototype

Abstract

Diffusion-based generative models’ impressive ability to create convincing images has garnered global attention. However, their complex structures and operations often pose challenges for non-experts to grasp. We present Diffusion Explainer, the first interactive visualization tool that explains how Stable Diffusion transforms text prompts into images. Diffusion Explainer tightly integrates a visual overview of Stable Diffusion’s complex structure with explanations of the underlying operations. By comparing image generation of prompt variants, users can discover the impact of keyword changes on image generation. A 56-participant user study demonstrates that Diffusion Explainer offers substantial learning benefits to non-experts. Our tool has been used by over 10,300 users from 124 countries at https://poloclub.github.io/diffusion-explainer/.