X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

Abstract

This paper presents X-Oscar, a progressive framework designed for generating high-quality text-guided avatars. While notable advancements have been made in this field, existing methods suffer from several limitations, such as producing over-saturated and low-quality output. To create high-quality 3D avatars, X-Oscar follows a “Geometry→Texture→Animation” paradigm. This framework enables the gradual generation of avatars, reducing the complexity of optimization through step-by-step learning. To address the issue of over-saturation, we propose Adaptive Variational Parameter (AVP), which represents a 3D avatar as a distribution rather than fixed parameters. Additionally, we introduce Avatar-aware Score Distillation Sampling (ASDS), a novel technique that incorporates avatar-aware noise into the rendered image instead of random noise. This modification significantly enhances the quality of the generated results. Extensive evaluations have been conducted to assess the effectiveness of X-Oscar in generating complex shapes, appearances, and poses of 3D avatars.

Method

Overview of X-Oscar, which consists of geometry modeling, appearance modeling, and animation refinement. X-Oscar is a progressive framework for text-to-avatar generation that follows a “Geometry→Texture→Animation” paradigm. This approach decomposes the complex task of avatar generation into a series of manageable subtasks, each focusing on a specific aspect of the avatar’s creation. In geometry modeling, we optimize the geometry of the avatars, represented by the SMPL-Xmodel, to align with the input text prompt by employing a differentiable rendering pipeline. After geometry modeling, we obtain a mesh that matches the prompt in shape. In appearance modeling, we represent the appearance of the result by optimizing an albedo map. In animation refinement, we change the pose of the avatar and optimize both geometry and appearance to address some inevitable obstructed parts. By minimizing the animation loss, we can refine the geometry and appearance of the avatar in various poses, resulting in improved quality and reduced artifacts in the final result.

Avatar Creation

X-Oscar enables the creation of delicate animatable 3D avatars from text prompts.

Generation process

We demonstrate the generation process of X-Oscar. It can be observed that the objects generated by X-Oscar exhibit high quality and fidelity.


Anna in Frozen	Warren Buffett	Hermoine Granger

Ada Wong	Aragorn from The Lord of the Rings	Flynn Rider

Canonical Avatars

Exclusively guided by a textual depiction, X-Oscar possesses the capability to produce a superior-quality canonical 3D avatar.


Aladdin in Aladdin	Frodo Baggins from The Lord of the Rings	Batman	Captain America

Gardener	Geralt of Rivia	IronMan	Jeff Bezos

Knight	Link from Zelda	Mulan	Steven Paul Jobs

Animatable Avatars (Same Pose)

In the presence of motion sequences, X-Oscar demonstrates the capacity to animate 3D avatars.

Animatable Avatars (Different Poses)

X-Oscar has the capability to generate 3D human avatars across diverse poses while preserving superior texture and geometry standards.

Motion Comparison

(The motivation of AvatarCLIP is generated by the professional tool Mixamo)

Application: Virtual Try-On

X-Oscar can facilitate avatar customization through text editing, as depicted in the following results. It is evident that X-Oscar is capable of generating realistic objects.


Jack Ma wearing a flowing sky-blue sundress	Jack Ma wearing a blue beanie, a black leather jacket, and blue jeans	Jack Ma wearing a blue shirt	Jack Ma wearing a down jacket	Jack Ma wearing a green t-shirt and a blue jeans


Jack Ma wearing a pink jacket	Jack Ma wearing a suit	Jack Ma wearing fitness clothing	Jack Ma wearing ski clothes	Jack Ma wearing a navy blue beanie, a blue sweater, and gray trousers

Application: Edit result in Blender software

X-Oscar's high-quality 3D model can seamlessly undergo edits using popular 3D graphics and image software like Blender.