IEEE VIS 2025 Content: SimVecVis: A Dataset for Enhancing MLLMs in Visualization Understanding

SimVecVis: A Dataset for Enhancing MLLMs in Visualization Understanding



 Can Liu -

 Chunlin Da -

 Xiaoxiao Long -

 Yuxiao Yang -

 Yu Zhang -

 Yong WANG -

 Download preprint PDF

 Download Supplemental Material

Room: Hall E1

2025-11-05T10:24:00.000ZGMT-0600Change your timezone on the schedule page
2025-11-05T10:24:00.000Z

Recorded video from this session can be viewed at the following link.
https://youtu.be/rPjFS7xuL5w

Keywords

Visualization LLM, Multimodal LLM, Chart QA

Abstract

Current multimodal large language models (MLLMs), while effective in natural image understanding, struggle with visualization understanding due to their inability to decode the data-to-visual mapping and extract structured information. To address these challenges, we propose SimVec, a novel simplified vector format that encodes chart elements such as mark type, position, and size. The effectiveness of SimVec is demonstrated by using MLLMs to reconstruct chart information from SimVec formats. Then, we build a new visualization dataset, SimVecVis, to enhance the performance of MLLMs in visualization understanding, which consists of three key dimensions: bitmap images of charts, their SimVec representations, and corresponding data-centric question-answering (QA) pairs with explanatory chain-of-thought (CoT) descriptions. We finetune state-of-the-art MLLMs (e.g., MiniCPM and Qwen-VL), using SimVecVis with different dataset dimensions. The experimental results show that it leads to substantial performance improvements of MLLMs with good spatial perception capabilities (e.g., MiniCPM) in data-centric QA tasks. Our dataset and source code are available at: https://github.com/VIDA-Lab/SimVecVis.