A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

1Xiamen University
*Equal contribution Corresponding author


This paper presents X-Oscar, a progressive framework designed for generating high-quality text-guided avatars. While notable advancements have been made in this field, existing methods suffer from several limitations, such as producing over-saturated and low-quality output. To create high-quality 3D avatars, X-Oscar follows a “Geometry→Texture→Animation” paradigm. This framework enables the gradual generation of avatars, reducing the complexity of optimization through step-by-step learning. To address the issue of over-saturation, we propose Adaptive Variational Parameter (AVP), which represents a 3D avatar as a distribution rather than fixed parameters. Additionally, we introduce Avatar-aware Score Distillation Sampling (ASDS), a novel technique that incorporates avatar-aware noise into the rendered image instead of random noise. This modification significantly enhances the quality of the generated results. Extensive evaluations have been conducted to assess the effectiveness of X-Oscar in generating complex shapes, appearances, and poses of 3D avatars.


Overview of X-Oscar, which consists of geometry modeling, appearance modeling, and animation refinement. X-Oscar is a progressive framework for text-to-avatar generation that follows a “Geometry→Texture→Animation” paradigm. This approach decomposes the complex task of avatar generation into a series of manageable subtasks, each focusing on a specific aspect of the avatar’s creation. In geometry modeling, we optimize the geometry of the avatars, represented by the SMPL-Xmodel, to align with the input text prompt by employing a differentiable rendering pipeline. After geometry modeling, we obtain a mesh that matches the prompt in shape. In appearance modeling, we represent the appearance of the result by optimizing an albedo map. In animation refinement, we change the pose of the avatar and optimize both geometry and appearance to address some inevitable obstructed parts. By minimizing the animation loss, we can refine the geometry and appearance of the avatar in various poses, resulting in improved quality and reduced artifacts in the final result.

Avatar Creation

X-Oscar enables the creation of delicate animatable 3D avatars from text prompts.

Generation process

We demonstrate the generation process of X-Oscar. It can be observed that the objects generated by X-Oscar exhibit high quality and fidelity.

Anna in Frozen Warren Buffett Hermoine Granger
Ada Wong Aragorn from The Lord of the Rings Flynn Rider



Canonical Avatars

Exclusively guided by a textual depiction, X-Oscar possesses the capability to produce a superior-quality canonical 3D avatar.

Aladdin in Aladdin Frodo Baggins from The Lord of the Rings Batman Captain America
Gardener Geralt of Rivia IronMan Jeff Bezos
Knight Link from Zelda Mulan Steven Paul Jobs



Animatable Avatars (Same Pose)

In the presence of motion sequences, X-Oscar demonstrates the capacity to animate 3D avatars.



Animatable Avatars (Different Poses)

X-Oscar has the capability to generate 3D human avatars across diverse poses while preserving superior texture and geometry standards.



Motion Comparison

(The motivation of AvatarCLIP is generated by the professional tool Mixamo)



Application: Virtual Try-On

X-Oscar can facilitate avatar customization through text editing, as depicted in the following results. It is evident that X-Oscar is capable of generating realistic objects.

Jack Ma wearing a flowing sky-blue sundress   Jack Ma wearing a blue beanie, a black leather jacket, and blue jeans Jack Ma wearing a blue shirt Jack Ma wearing a down jacket   Jack Ma wearing a green t-shirt and a blue jeans
Jack Ma wearing a pink jacket Jack Ma wearing a suit Jack Ma wearing fitness clothing Jack Ma wearing ski clothes Jack Ma wearing a navy blue beanie, a blue sweater, and gray trousers



Application: Edit result in Blender software

X-Oscar's high-quality 3D model can seamlessly undergo edits using popular 3D graphics and image software like Blender.