IEEE VIS 2024 Content: Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion



Seongmin Lee - Georgia Tech, Atlanta, United States

 Benjamin Hoover - GA Tech, Atlanta, United States. IBM Research AI, Cambridge, United States

 Hendrik Strobelt - IBM Research AI, Cambridge, United States

 Zijie J. Wang - Georgia Tech, Atlanta, United States

 ShengYun Peng - Georgia Institute of Technology, Atlanta, United States

 Austin P Wright - Georgia Institute of Technology , Atlanta , United States

 Kevin Li - Georgia Institute of Technology, Atlanta, United States

 Haekyu Park - Georgia Institute of Technology, Atlanta, United States

 Haoyang Yang - Georgia Institute of Technology, Atlanta, United States

 Duen Horng (Polo) Chau - Georgia Tech, Atlanta, United States

 Screen-reader Accessible PDF

 Room: Bayshore VI

2024-10-17T17:54:00ZGMT-0600Change your timezone on the schedule page
2024-10-17T17:54:00Z

Exemplar figure, described by caption below — With Diffusion Explainer, users can visually examine how text prompt (e.g., “a cute and adorable bunny... pixar character”) is encoded by the Text Representation Generator into vectors to guide the Image Representation Refiner to iteratively refine the vector representation of the image being generated. The Timestep Controller enables users to review the incremental improvements in image quality and adherence to the prompt over timesteps. Diffusion Explainer tightly integrates a visual overview of Stable Diffusion’s complex components with detailed explanations of their underlying operations, enabling users to fluidly transition between multiple levels of abstraction through animations and interactive elements.

Fast forward

Full Video

Keywords

Machine Learning, Statistics, Modelling, and Simulation Applications; Software Prototype

Abstract

Diffusion-based generative models’ impressive ability to create convincing images has garnered global attention. However, their complex structures and operations often pose challenges for non-experts to grasp. We present Diffusion Explainer, the first interactive visualization tool that explains how Stable Diffusion transforms text prompts into images. Diffusion Explainer tightly integrates a visual overview of Stable Diffusion’s complex structure with explanations of the underlying operations. By comparing image generation of prompt variants, users can discover the impact of keyword changes on image generation. A 56-participant user study demonstrates that Diffusion Explainer offers substantial learning benefits to non-experts. Our tool has been used by over 10,300 users from 124 countries at https://poloclub.github.io/diffusion-explainer/.