Stylizing Sparse-View 3D Scenes with Hierarchical Neural Representation
Yifan Wang -
Ang Gao -
Yi Gong -
Yuan Zeng -

Screen-reader Accessible PDF
Download preprint PDF
DOI: 10.1109/TVCG.2025.3558468
Room: Hall E2
Keywords
Three-dimensional displays, Neural radiance field, Geometry, Image reconstruction, Rendering (computer graphics), Optimization, Semantics, Visualization, Training, Encoding
Abstract
3D scene stylization refers to generating stylized images of the scene at arbitrary novel view angles following a given set of style images while ensuring consistency when rendered from different views. Recently, several 3D style transfer methods leveraging the scene reconstruction capabilities of pre-trained neural radiance fields (NeRF) have been proposed. To successfully stylize a scene this way, one must first reconstruct a photo-realistic radiance field from collected images of the scene. However, when only sparse input views are available, pre-trained few-shot NeRFs often suffer from high-frequency artifacts, which are generated as a by-product of high-frequency details for improving reconstruction quality. Is it possible to generate more faithful stylized scenes from sparse inputs by directly optimizing encoding-based scene representation with target style? In this paper, we consider the stylization of sparse-view scenes in terms of disentangling content semantics and style textures. We propose a coarse-to-fine sparse-view scene stylization framework, where a novel hierarchical encoding-based neural representation is designed to generate high-quality stylized scenes directly from implicit scene representations. We also propose a new optimization strategy with content strength annealing to achieve realistic stylization and better content preservation. Extensive experiments demonstrate that our method can achieve high-quality stylization of sparse-view scenes and outperforms fine-tuning-based baselines in terms of stylization quality and efficiency.