X-Mesh:Towards Fast and Accurate Text-driven 3D Stylization
via Dynamic Textual Guidance

  • 1MAC Lab, School of Informatics, Xiamen University
  • 2Institute of Artificial Intelligence, Xiamen University
  • 3CATL
  • ✉corresponding author
Accepted by ICCV 2023 (Main Track)
TL;DR: X-Mesh is text-driven model to stylize mesh by manipulating vertex color and displacement.
Abstract
Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior methods adopt text-independent multilayer perceptrons (MLPs) to predict the attributes of the target mesh with the supervision of CLIP loss. However, such text-independent architecture lacks textual guidance during predicting attributes, thus leading to unsatisfactory stylization and slow convergence. To address these limitations, we present X-Mesh, an innovative text-driven 3D stylization framework that incorporates a novel Text-guided Dynamic Attention Module (TDAM). The TDAM dynamically integrates the guidance of the target text by utilizing textrelevant spatial and channel-wise attentions during vertex feature extraction, resulting in more accurate attribute prediction and faster convergence speed. Furthermore, existing works lack standard benchmarks and automated metrics for evaluation, often relying on subjective and nonreproducible user studies to assess the quality of stylized 3D assets. To overcome this limitation, we introduce a new standard text-mesh benchmark, namely MIT-30, and two automated metrics, which will enable future research to achieve fair and objective comparisons. Our extensive qualitative and quantitative experiments demonstrate that X-Mesh outperforms previous state-of-the-art methods.
GALLERY

(a) Stylized Asset && Geometry

Cactus Vase
Bird
Porcelain Bowl
Disney Castle
Fire Dragon
Brick Alien
Cactus Chair
Colorful Lamp
Silver Cat
Wooden Skull


(b) Stylized Asset && Bare Mesh

before
after
Wooden Vanity Table
before
after
Squirrel
before
after
Brick Castle
before
after
Superman
before
after
Cactus Chair


(c) Training process

With TDAM
Without TDAM
Steven Jobs
Astronaut Horse
Jeans Alien
Crochet Lamp


(d) Bare Mesh &&Geometry&& Stylized Asset

A 3D rendering of a wooden phoenix in unreal engine.
A 3D rendering of a dark castle in unreal engine.
A 3D rendering of a Ginger cat with black collar in unreal engine.
A 3D rendering of a BlueWhale in unreal engine.
A 3D rendering of a brown owl standing on a trunk in unreal engine.
A 3D rendering of a crocodile in unreal engine.

Bibtex
              
@misc{ma2023xmesh,
    title={X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance}, 
    author={Yiwei Ma and Xiaioqing Zhang and Xiaoshuai Sun and Jiayi Ji and Haowei Wang and Guannan Jiang and Weilin Zhuang and Rongrong Ji},
    year={2023},
    eprint={2303.15764},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
              
            
Related Works

There are lots of wonderful works that might interest you.

+ Text2Mesh is a previous work on text-to-3D generation and manipulation.

+ TANGO transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner.