Yiwei Ma

Yiwei Ma   马祎炜

Algorithm Engineer @ dots, Xiaohongshu算法工程师 @ 小红书 dots

Research focus:研究方向: Multimodal Large Language Models多模态大语言模型Text-to-Image Pretraining文生图预训练

✉ mayiwei1998@163.com

[Biography] [Latest News] [Publications] [Projects] [Major Awards] [Patent] [Professional Activities] [个人简介] [最新动态] [论文成果] [代表项目] [主要奖项] [发明专利] [学术活动]

Biography个人简介   [back top][返回顶部]

I am an Algorithm Engineer at the dots team of Xiaohongshu (RED), specializing in Multimodal Large Language Models and Text-to-Image Pretraining. I received my Ph.D. from the Department of Artificial Intelligence, School of Informatics, Xiamen University, advised by Prof. Rongrong Ji and Prof. Xiaoshuai Sun. Feel free to reach out to me via email.

我是小红书(RED)dots 团队的算法工程师,研究方向为多模态大语言模型文生图预训练。我于厦门大学信息学院人工智能系获得博士学位,师从纪荣嵘教授孙晓帅教授欢迎通过邮箱与我联系。

My recent research interests are in Multimodal Large Language Models (MLLMs) and Text-to-Image Pretraining. I have published 27 papers in top-tier CCF-A/B conferences and journals (17 as first/co-first author, 3 Orals), with 1500+ Google Scholar citations. I serve as an Area Chair for ACM MM 2025 Workshop and as a reviewer for CVPR, NeurIPS, ICLR, ICCV, IEEE TPAMI and other venues. I am also the core developer of the open-source External-Attention-pytorch project (12k+ GitHub stars). I have received offers for the flagship top-talent programs of leading tech companies, including Xiaohongshu Red Star, Tencent Qingyun, Tongyi Alibaba Star, ByteDance Jindouyun, Ant Star, Huawei Genius Youth, Meituan Beidou, Xiaomi Top Talent, and JD TGT.

我近期的研究方向集中在多模态大语言模型(MLLMs)文生图预训练。已在 CCF-A/B 类顶级会议与期刊发表论文 27 篇(其中一作/共一 17 篇,3 篇 Oral),谷歌学术引用 1500+ 次。担任 ACM MM 2025 Workshop 领域主席,以及 CVPR、NeurIPS、ICLR、ICCV、IEEE TPAMI 等多个会议与期刊的审稿人。同时是开源项目 External-Attention-pytorch 的核心开发者(GitHub 12k+ stars)。获得小红书 Red Star、腾讯青云、通义阿里星、字节跳动筋斗云、蚂蚁星、华为天才少年、美团北斗、小米顶尖人才、京东 TGT 等多家头部科技公司顶尖人才计划录用。

Latest News最新动态   [back top][返回顶部]

Publications论文成果   [back top][返回顶部]

Journal期刊

Yiwei Ma, Ke Ye, Weihuang Lin, Jiayi Ji, Xiaoshuai Sun, Tat-Seng Chua, Rongrong Ji
An Extensive Benchmark for Single-Round and Multi-Round Instruction-Based Image Editing
International Journal of Computer Vision (IJCV), 2026
[PDF] [Code]
Yiwei Ma, Jiayi Ji, Zhipeng Qian, Xiaoshuai Sun, Rongrong Ji
CoP: Chain of Perception for Referring 3D Instance Segmentation
International Journal of Computer Vision (IJCV), 2026
[PDF] [Code]
Yiwei Ma, Weihuang Lin, Zhibin Wang, Jiayi Ji, Xiaoshuai Sun, Weisi Lin, Rongrong Ji
Boosting Multi-Modal Large Language Model with Enhanced Visual Features
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
[PDF] [Code]
Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Xiaopeng Hong, Yongjian Wu, Rongrong Ji
Image Captioning via Dynamic Path Customization
IEEE Transactions on Neural Networks and Learning System (TNNLS), 2024
[PDF] [ArXiv] [Code]
Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji
X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation
ACM Transactions on Multimedia Computing, Communications, and Applications (ToMM), 2024
[PDF] [arXiv] [Code] [Project Page]
Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Rongrong Ji
Towards Local Visual Modeling for Image Captioning
Pattern Recognition (PR), 2023 (ESI Highly Cited Paper)(ESI 高被引论文)
[PDF] [ArXiv] [Code]
Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Yongjian Wu, Feiyue Huang, Rongrong Ji
Knowing what it is: Semantic-enhanced Dual Attention Transformer
IEEE Transactions on Multimedia (TMM), 2022
[PDF] [Code]
Jiayi Ji, Yiwei Ma (co-frist author), Xiaoshuai Sun, Yiyi Zhou, Yongjian Wu, Rongrong Ji
Knowing What to Learn: A Metric-oriented Focal Mechanism for Image Captioning
IEEE Transactions on Image Processing (TIP), 2022
[PDF] [Code]

Conference会议

Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun, Rongrong Ji
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
Conference on Neural Information Processing Systems (NeurIPS), 2024
[arXiv] [Code]
Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation
International Conference on Machine Learning (ICML), 2024
[arXiv] [Code] [Project Page]
Yiwei Ma, Xiaoqing Zhang, Xiaoshuai Sun, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, Rongrong Ji
X-Mesh:Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
IEEE International Conference on Computer Vision (ICCV), 2023
[PDF] [arXiv] [Code] [Project Page]
Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Ming Yan, Ji Zhang, Rongrong Ji
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
ACM International Conference on Multimedia (ACM MM), 2022 (Cite: 500+)
[PDF] [arXiv] [Code] [Project Page]
Yiwei Ma, Xiaoshuai Sun, Jiayi Ji, Guannan Jiang, Weilin Zhuang, Rongrong Ji
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval
ACM International Conference on Multimedia (ACM MM), 2023
[PDF] [Code] [Project Page]
Weihuang Lin, Yiwei Ma (co-frist author), Xiaoshuai Sun, Shuting He, Jiayi Ji, Liujuan Cao, Rongrong Ji
HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation
ACM International Conference on Multimedia (ACM MM), 2025
[arXiv] [Code]
Zhipeng Qian, Yiwei Ma (co-frist author), Zhekai Lin, Jiayi Ji, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji
Multi-branch Collaborative Learning Network for 3D Visual Grounding
European Conference on Computer Vision (ECCV), 2024
[arXiv] [Code]
Sihan Liu, Yiwei Ma (co-frist author), Xiaoqing Zhang, Haowei Wang, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Computer Vision and Pattern Recognition Conference (CVPR), 2024
[arXiv] [Code]
Changli Wu, Yiwei Ma (co-frist author), Qi Chen, Haowei Wang, Gen Luo, Jiayi Ji, Xiaoshuai Sun
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2024
[arXiv] [Code] [PDF]
Zhipeng Qian, Yiwei Ma (co-frist author), Jiayi Ji, Xiaoshuai Sun
X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks
AAAI Conference on Artificial Intelligence (AAAI), 2024
[Code] [PDF]

Preprint预印本

Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Jiayi Ji, Jie Lou, Debing Zhang, Rongrong Ji
MLLM-Selector: Necessity and Diversity-driven High-value Data Selection for Enhanced Visual Instruction Tuning
arXiv preprint arXiv:2503.20502, 2025
[arXiv]
Weihuang Lin, Yiwei Ma (co-frist author), Jiayi Ji, Xiaoshuai Sun, Rongrong Ji
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
arXiv preprint arXiv:2510.08003, 2025
[arXiv]
Yijun Fan, Yiwei Ma (co-frist author), Jiayi Ji, Xiaoshuai Sun, Rongrong Ji
Any-to-3D Generation via Hybrid Diffusion Supervision
arXiv preprint arXiv:2411.14715, 2024
[arXiv]
Yiwei Ma, Zhibin Wang, Xiaoshuai Sun, Weihuang Lin, Qiang Zhou, Jiayi Ji, Rongrong Ji
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
arXiv preprint arXiv:2407.16198, 2024
[arXiv] [Code]

Projects代表项目   [back top][返回顶部]

dots.vlm1.inst (Xiaohongshu · dots)
Instruction-tuned multimodal large language model from the dots series, released on Hugging Face.
[Hugging Face] [GitHub]
dots.mocr (Xiaohongshu · dots)
Multilingual document layout parsing and OCR model from the dots series, released on Hugging Face.
[Hugging Face] [GitHub] [arXiv]

External-Attention-pytorch
Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.
[github ] (12,000+ stars)

Patent发明专利   [back top][返回顶部]

已授权 一种基于文本信息的指向性3D实例分割方法 — 公开号: CN117634486A  [详情]
已授权 一种基于链式感知的指向性3D实例分割方法 — 公开号: CN117593527A  [详情]
已授权 利用文本到图像扩散模型实现短语级定位的方法 — 公开号: CN118247799A  [详情]
已授权 一种面向指向性目标分割的半监督学习方法 — 公开号: CN117975241A  [详情]
已授权 面向局部视觉建模的图像描述生成方法 — 公开号: CN115964530A  [详情]
已授权 基于文本的人物检索的双向一对多嵌入对齐方法 — 公开号: CN116304145A  [详情]
已授权 基于空间感知网络的三维指向性目标分割方法 — 公开号: CN118365659A  [详情]
已授权 通过混合扩散监督生成多模态到三维物体的方法 — 授权公告号: CN119625216B  [详情]
已授权 基于多模态大语言模型的对话生成方法及装置 — 授权公告号: CN119938874B  [详情]

Major Awards主要奖项   [back top][返回顶部]

Professional Activities学术活动   [back top][返回顶部]