IEEE VIS 2025 Content: Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation

Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation

Jiayi Hong -

Christian Seto -

Arlen Fan -

Ross Maciejewski -

Image not found
Screen-reader Accessible PDF

Room: Room 0.11 + 0.12

Keywords

Data visualization, Visualization, Costs, Data models, Benchmark testing, Codes, Training, Data mining, Computational modeling, Cognition

Abstract

In this article, we assess the visualization literacy of two prominent Large Language Models (LLMs): OpenAI’s Generative Pretrained Transformers (GPT), the backend of ChatGPT, and Google’s Gemini, previously known as Bard, to establish benchmarks for assessing their visualization capabilities. While LLMs have shown promise in generating chart descriptions, captions, and design suggestions, their potential for evaluating visualizations remains under-explored. Collecting data from humans for evaluations has been a bottleneck for visualization research in terms of both time and money, and if LLMs were able to serve, even in some limited role, as evaluators, they could be a significant resource. To investigate the feasibility of using LLMs in the visualization evaluation process, we explore the extent to which LLMs possess visualization literacy—a crucial factor for their effective utility in the field. We conducted a series of experiments using a modified 53-item Visualization Literacy Assessment Test (VLAT) for and . Our findings indicate that the LLMs we explored currently fail to achieve the same levels of visualization literacy when compared to data from the general public reported in VLAT, and LLMs heavily relied on their pre-existing knowledge to answer questions instead of utilizing the information provided by the visualization when answering questions.