Flag Counter

Yiwei Ma   马祎炜

Final-year Ph.D Student at Xiamen University (on the job market)


Email: yiweima@stu.xmu.edu.cn
             [Github]   [Google Scholar]

[Biography] [Latest News] [Publications] [Projects] [Major Awards] [Patent] [Professional Activities]

Biography   [back top]

I am currently a final-year Ph.D student in Department of Artificial Intelligence, School of Informatics, Xiamen University, advised by Prof. Rongrong Ji and Prof. Xiaoshuai Sun. I am actively looking for job opportunities (industry / faculty / postdoc), please feel free to contact me via email.

My recent research interests are in Multimodal Large Language Models (MLLMs) and Multimodal Generation.

Latest News   [back top]

Publications   [back top]

Journal

Yiwei Ma, Ke Ye, Weihuang Lin, Jiayi Ji, Xiaoshuai Sun, Tat-Seng Chua, Rongrong Ji
An Extensive Benchmark for Single-Round and Multi-Round Instruction-Based Image Editing
International Journal of Computer Vision (IJCV), 2026
[Code]
Yiwei Ma, Jiayi Ji, Zhipeng Qian, Xiaoshuai Sun, Rongrong Ji
CoP: Chain of Perception for Referring 3D Instance Segmentation
International Journal of Computer Vision (IJCV), 2026
[PDF]
Yiwei Ma, Weihuang Lin, Zhibin Wang, Jiayi Ji, Xiaoshuai Sun, Weisi Lin, Rongrong Ji
Boosting Multi-Modal Large Language Model with Enhanced Visual Features
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
[PDF] [Code]
Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Xiaopeng Hong, Yongjian Wu, Rongrong Ji
Image Captioning via Dynamic Path Customization
IEEE Transactions on Neural Networks and Learning System (TNNLS), 2024
[PDF] [ArXiv] [Code]
Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji
X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation
ACM Transactions on Multimedia Computing, Communications, and Applications (ToMM), 2024
[arXiv] [Code] [Project Page]
Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Rongrong Ji
Towards Local Visual Modeling for Image Captioning
Pattern Recognition (PR), 2023
[PDF] [ArXiv] [Code]
Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Yongjian Wu, Feiyue Huang, Rongrong Ji
Knowing what it is: Semantic-enhanced Dual Attention Transformer
IEEE Transactions on Multimedia (TMM), 2022
[PDF] [Code]
Jiayi Ji, Yiwei Ma (co-frist author), Xiaoshuai Sun, Yiyi Zhou, Yongjian Wu, Rongrong Ji
Knowing What to Learn: A Metric-oriented Focal Mechanism for Image Captioning
IEEE Transactions on Image Processing (TIP), 2022
[PDF] [Code]

Conference

Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun, Rongrong Ji
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
Conference on Neural Information Processing Systems (NeurIPS), 2024
[arXiv] [Code]
Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation
International Conference on Machine Learning (ICML), 2024
[arXiv] [Code] [Project Page]
Yiwei Ma, Xiaoqing Zhang, Xiaoshuai Sun, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, Rongrong Ji
X-Mesh:Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
IEEE International Conference on Computer Vision (ICCV), 2023
[PDF] [arXiv] [Code] [Project Page]
Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Ming Yan, Ji Zhang, Rongrong Ji
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
ACM International Conference on Multimedia (ACM MM), 2022 (Cite: 500+)
[PDF] [arXiv] [Code] [Project Page]
Yiwei Ma, Xiaoshuai Sun, Jiayi Ji, Guannan Jiang, Weilin Zhuang, Rongrong Ji
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval
ACM International Conference on Multimedia (ACM MM), 2023
[PDF] [Code] [Project Page]
Weihuang Lin, Yiwei Ma (co-frist author), Xiaoshuai Sun, Shuting He, Jiayi Ji, Liujuan Cao, Rongrong Ji
HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation
ACM International Conference on Multimedia (ACM MM), 2025
[arXiv] [Code]
Zhipeng Qian, Yiwei Ma (co-frist author), Zhekai Lin, Jiayi Ji, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji
Multi-branch Collaborative Learning Network for 3D Visual Grounding
European Conference on Computer Vision (ECCV), 2024
[arXiv] [Code]
Sihan Liu, Yiwei Ma (co-frist author), Xiaoqing Zhang, Haowei Wang, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Computer Vision and Pattern Recognition Conference (CVPR), 2024
[arXiv] [Code]
Changli Wu, Yiwei Ma (co-frist author), Qi Chen, Haowei Wang, Gen Luo, Jiayi Ji, Xiaoshuai Sun
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2024
[arXiv] [Code] [PDF]
Zhipeng Qian, Yiwei Ma (co-frist author), Jiayi Ji, Xiaoshuai Sun
X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks
AAAI Conference on Artificial Intelligence (AAAI), 2024
[Code] [PDF]

Preprint

Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Jiayi Ji, Jie Lou, Debing Zhang, Rongrong Ji
MLLM-Selector: Necessity and Diversity-driven High-value Data Selection for Enhanced Visual Instruction Tuning
arXiv preprint arXiv:2503.20502, 2025
[arXiv]
Weihuang Lin, Yiwei Ma (co-frist author), Jiayi Ji, Xiaoshuai Sun, Rongrong Ji
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
arXiv preprint arXiv:2510.08003, 2025
[arXiv]
Yijun Fan, Yiwei Ma (co-frist author), Jiayi Ji, Xiaoshuai Sun, Rongrong Ji
Any-to-3D Generation via Hybrid Diffusion Supervision
arXiv preprint arXiv:2411.14715, 2024
[arXiv]
Yiwei Ma, Zhibin Wang, Xiaoshuai Sun, Weihuang Lin, Qiang Zhou, Jiayi Ji, Rongrong Ji
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
arXiv preprint arXiv:2407.16198, 2024
[arXiv] [Code]

Projects   [back top]


External-Attention-pytorch
Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.
[github ] (11400+ stars)

Patent   [back top]

  • (已授权)一种基于文本信息的指向性3D实例分割方法 - 公开号: CN117634486A - [详情]
  • (已授权)一种基于链式感知的指向性3D实例分割方法 - 公开号: CN117593527A - [详情]
  • (已授权)利用文本到图像扩散模型实现短语级定位的方法 - 公开号: CN118247799A - [详情]
  • (已授权)一种面向指向性目标分割的半监督学习方法 - 公开号: CN117975241A - [详情]
  • (已授权)面向局部视觉建模的图像描述生成方法 - 公开号: CN115964530A - [详情]
  • (已授权)基于文本的人物检索的双向一对多嵌入对齐方法 - 公开号: CN116304145A - [详情]
  • (已授权)基于空间感知网络的三维指向性目标分割方法 - 公开号: CN118365659A - [详情]
  • (已授权)通过混合扩散监督生成多模态到三维物体的方法 - 授权公告号: CN119625216B - [详情]
  • (已授权)基于多模态大语言模型的对话生成方法及装置 - 授权公告号: CN119938874B - [详情]
  • Major Awards   [back top]

    Professional Activities   [back top]