IEEE VIS 2025 Content: Metric Design != Metric Behavior: Improving Metric Selection for the Unbiased Evaluation of Dimensionality Reduction

Metric Design != Metric Behavior: Improving Metric Selection for the Unbiased Evaluation of Dimensionality Reduction

Jiyeon Bae -

Hyeon Jeon -

Jinwook Seo -

Screen-reader Accessible PDF
Image not found
The paper will interest data scientists and machine-learning engineers who inspect low-dimensional embeddings. It will also benefit visualization practitioners and tool developers responsible for reporting the quality of dimensionality-reduction (DR) projections. In addition, domain researchers in fields where DR is common—such as bioinformatics, HCI, and signal processing—will find it relevant. Readers can apply the workflow immediately by (1) computing empirical correlations among metrics across diverse DR projections; (2) clustering the metrics by their correlation-based similarity; and (3) selecting one representative metric from each cluster to minimize redundancy and bias, thereby making DR evaluations fairer and more trustworthy.
Keywords

Dimensionality reduction, Evaluation metrics, Correlation analysis, Benchmarking, Visual analytics

Abstract

Evaluating the accuracy of dimensionality reduction (DR) projections in preserving the structure of high-dimensional data is crucial for reliable visual analytics. Diverse evaluation metrics targeting different structural characteristics have thus been developed. However, evaluations of DR projections can become biased if highly correlated metrics—those measuring similar structural characteristics—are inadvertently selected, favoring DR techniques that emphasize those characteristics. To address this issue, we propose a novel workflow that reduces bias in the selection of evaluation metrics by clustering metrics based on their empirical correlations rather than on their intended design characteristics alone. Our workflow works by computing metric similarity using pairwise correlations, clustering metrics to minimize overlap, and selecting a representative metric from each cluster. Quantitative experiments demonstrate that our approach improves the stability of DR evaluation, which indicates that our workflow contributes to mitigating evaluation bias.