IEEE VIS 2025 Content: GhostUMAP2: Measuring and Analyzing (r,d)-Stability of UMAP

GhostUMAP2: Measuring and Analyzing (r,d)-Stability of UMAP

Myeongwon Jung -

Takanori Fujiwara -

Jaemin Jo -

Screen-reader Accessible PDF
Image not found
Practitioners who work with high-dimensional data and use dimensionality reduction (DR) techniques, such as data scientists, machine learning engineers, and biologists, would be particularly interested in this paper. Our method provides a way to assess the stability of low-dimensional projections against the stochasticity of UMAP. By applying our technique, practitioners can gain valuable insights into their data, such as identifying which instances are unstable in their projection space or which data points might shift between clusters across different DR runs. This can lead to more informed interpretations and increased confidence in downstream analyses or visualizations.
Keywords

Dimensionality reduction, manifold learning, stochastic optimization, reliability, visualization, WebGPU

Abstract

Despite the widespread use of Uniform Manifold Approximation and Projection (UMAP), the impact of its stochastic optimization process on the results remains underexplored. We observed that it often produces unstable results where the projections of data points are determined mostly by chance rather than reflecting neighboring structures. To address this limitation, we introduce (r,d)-stability to UMAP: a framework that analyzes the stochastic positioning of data points in the projection space. To assess how stochastic elements—specifically, initial projection positions and negative sampling—impact UMAP results, we introduce “ghosts”, or duplicates of data points representing potential positional variations due to stochasticity. We define a data point’s projection as (r,d)-stable if its ghosts perturbed within a circle of radius r in the initial projection remain confined within a circle of radius d for their final positions. To efficiently compute the ghost projections, we develop an adaptive dropping scheme that reduces a runtime up to 60% compared to an unoptimized baseline while maintaining approximately 90% of unstable points. We also present a visualization tool that supports the interactive exploration of the (r,d)-stability of data points. Finally, we demonstrate the effectiveness of our framework by examining the stability of projections of real-world datasets and present usage guidelines for the effective use of our framework.