Mesh 4
☆ Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models
Donggoo Kang, Jangyeong Kim, Dasol Jeong, Junyoung Choi, Jeonga Wi, Hyunmin Lee, Joonho Gwon, Joonki Paik
Current texture synthesis methods, which generate textures from fixed
viewpoints, suffer from inconsistencies due to the lack of global context and
geometric understanding. Meanwhile, recent advancements in video generation
models have demonstrated remarkable success in achieving temporally consistent
videos. In this paper, we introduce VideoTex, a novel framework for seamless
texture synthesis that leverages video generation models to address both
spatial and temporal inconsistencies in 3D textures. Our approach incorporates
geometry-aware conditions, enabling precise utilization of 3D mesh structures.
Additionally, we propose a structure-wise UV diffusion strategy, which enhances
the generation of occluded areas by preserving semantic information, resulting
in smoother and more coherent textures. VideoTex not only achieves smoother
transitions across UV boundaries but also ensures high-quality, temporally
stable textures across video frames. Extensive experiments demonstrate that
VideoTex outperforms existing methods in texture fidelity, seam blending, and
stability, paving the way for dynamic real-time applications that demand both
visual quality and temporal coherence.
☆ PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling ICCV 2025
Skinning and rigging are fundamental components in animation, articulated
object reconstruction, motion transfer, and 4D generation. Existing approaches
predominantly rely on Linear Blend Skinning (LBS), due to its simplicity and
differentiability. However, LBS introduces artifacts such as volume loss and
unnatural deformations, and it fails to model elastic materials like soft
tissues, fur, and flexible appendages (e.g., elephant trunks, ears, and fatty
tissues). In this work, we propose PhysRig: a differentiable physics-based
skinning and rigging framework that overcomes these limitations by embedding
the rigid skeleton into a volumetric representation (e.g., a tetrahedral mesh),
which is simulated as a deformable soft-body structure driven by the animated
skeleton. Our method leverages continuum mechanics and discretizes the object
as particles embedded in an Eulerian background grid to ensure
differentiability with respect to both material properties and skeletal motion.
Additionally, we introduce material prototypes, significantly reducing the
learning space while maintaining high expressiveness. To evaluate our
framework, we construct a comprehensive synthetic dataset using meshes from
Objaverse, The Amazing Animals Zoo, and MixaMo, covering diverse object
categories and motion patterns. Our method consistently outperforms traditional
LBS-based approaches, generating more realistic and physically plausible
results. Furthermore, we demonstrate the applicability of our framework in the
pose transfer task highlighting its versatility for articulated object
modeling.
comment: Accepted by ICCV 2025
♻ ☆ Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
We introduce a diffusion-based framework that performs aligned novel view
image and geometry generation via a warping-and-inpainting methodology. Unlike
prior methods that require dense posed images or pose-embedded generative
models limited to in-domain views, our method leverages off-the-shelf geometry
predictors to predict partial geometries viewed from reference images, and
formulates novel-view synthesis as an inpainting task for both image and
geometry. To ensure accurate alignment between generated images and geometry,
we propose cross-modal attention distillation, where attention maps from the
image diffusion branch are injected into a parallel geometry diffusion branch
during both training and inference. This multi-task approach achieves
synergistic effects, facilitating geometrically robust image synthesis as well
as well-defined geometry prediction. We further introduce proximity-based mesh
conditioning to integrate depth and normal cues, interpolating between point
cloud and filtering erroneously predicted geometry from influencing the
generation process. Empirically, our method achieves high-fidelity
extrapolative view synthesis on both image and geometry across a range of
unseen scenes, delivers competitive reconstruction quality under interpolation
settings, and produces geometrically aligned colored point clouds for
comprehensive 3D completion. Project page is available at
https://cvlab-kaist.github.io/MoAI.
comment: Project page at https://cvlab-kaist.github.io/MoAI
♻ ☆ 2D Triangle Splatting for Direct Differentiable Mesh Training
Differentiable rendering with 3D Gaussian primitives has emerged as a
powerful method for reconstructing high-fidelity 3D scenes from multi-view
images. While it offers improvements over NeRF-based methods, this
representation still encounters challenges with rendering speed and advanced
rendering effects, such as relighting and shadow rendering, compared to
mesh-based models. In this paper, we propose 2D Triangle Splatting (2DTS), a
novel method that replaces 3D Gaussian primitives with 2D triangle facelets.
This representation naturally forms a discrete mesh-like structure while
retaining the benefits of continuous volumetric modeling. By incorporating a
compactness parameter into the triangle primitives, we enable direct training
of photorealistic meshes. Our experimental results demonstrate that our
triangle-based method, in its vanilla version (without compactness tuning),
achieves higher fidelity compared to state-of-the-art Gaussian-based methods.
Furthermore, our approach produces reconstructed meshes with superior visual
quality compared to existing mesh reconstruction methods. Please visit our
project page at https://gaoderender.github.io/triangle-splatting.
comment: 13 pages, 8 figures