I am currently a final-year Ph.D student in Department of Artificial Intelligence, School of Informatics, Xiamen University, advised by Prof. Rongrong Ji and Prof. Xiaoshuai Sun. I am actively looking for job opportunities (industry / faculty / postdoc), please feel free to contact me via email.
My recent research interests are in Multimodal Large Language Models (MLLMs) and Multimodal Generation.
![]() |
Yiwei Ma, Ke Ye, Weihuang Lin, Jiayi Ji, Xiaoshuai Sun✉, Tat-Seng Chua, Rongrong Ji
An Extensive Benchmark for Single-Round and Multi-Round Instruction-Based Image Editing International Journal of Computer Vision (IJCV), 2026 [Code] |
![]() |
Yiwei Ma, Jiayi Ji, Zhipeng Qian, Xiaoshuai Sun✉, Rongrong Ji
CoP: Chain of Perception for Referring 3D Instance Segmentation International Journal of Computer Vision (IJCV), 2026 [PDF] |
![]() |
Yiwei Ma, Weihuang Lin, Zhibin Wang, Jiayi Ji, Xiaoshuai Sun✉, Weisi Lin, Rongrong Ji
Boosting Multi-Modal Large Language Model with Enhanced Visual Features IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025 [PDF] [Code] |
![]() |
Yiwei Ma, Jiayi Ji, Xiaoshuai Sun✉, Yiyi Zhou, Xiaopeng Hong, Yongjian Wu, Rongrong Ji
Image Captioning via Dynamic Path Customization IEEE Transactions on Neural Networks and Learning System (TNNLS), 2024 [PDF] [ArXiv] [Code] |
![]() |
Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun✉, Guannan Jiang, Annan Shu, Rongrong Ji
X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation ACM Transactions on Multimedia Computing, Communications, and Applications (ToMM), 2024 [arXiv] [Code] [Project Page] |
![]() |
Yiwei Ma, Jiayi Ji, Xiaoshuai Sun✉, Yiyi Zhou, Rongrong Ji
Towards Local Visual Modeling for Image Captioning Pattern Recognition (PR), 2023 [PDF] [ArXiv] [Code] |
![]() |
Yiwei Ma, Jiayi Ji, Xiaoshuai Sun✉, Yiyi Zhou, Yongjian Wu, Feiyue Huang, Rongrong Ji
Knowing what it is: Semantic-enhanced Dual Attention Transformer IEEE Transactions on Multimedia (TMM), 2022 [PDF] [Code] |
![]() |
Jiayi Ji, Yiwei Ma (co-frist author), Xiaoshuai Sun✉, Yiyi Zhou, Yongjian Wu, Rongrong Ji
Knowing What to Learn: A Metric-oriented Focal Mechanism for Image Captioning IEEE Transactions on Image Processing (TIP), 2022 [PDF] [Code] |
![]() |
Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun✉, Rongrong Ji
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing Conference on Neural Information Processing Systems (NeurIPS), 2024 [arXiv] [Code] |
![]() |
Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun✉, Rongrong Ji
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation International Conference on Machine Learning (ICML), 2024 [arXiv] [Code] [Project Page] |
![]() |
Yiwei Ma, Xiaoqing Zhang, Xiaoshuai Sun✉, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, Rongrong Ji
X-Mesh:Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance IEEE International Conference on Computer Vision (ICCV), 2023 [PDF] [arXiv] [Code] [Project Page] |
![]() |
Yiwei Ma, Guohai Xu, Xiaoshuai Sun✉, Ming Yan, Ji Zhang, Rongrong Ji
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval ACM International Conference on Multimedia (ACM MM), 2022 (Cite: 500+) [PDF] [arXiv] [Code] [Project Page] |
![]() |
Yiwei Ma, Xiaoshuai Sun✉, Jiayi Ji, Guannan Jiang, Weilin Zhuang, Rongrong Ji
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval ACM International Conference on Multimedia (ACM MM), 2023 [PDF] [Code] [Project Page] |
![]() |
Weihuang Lin, Yiwei Ma (co-frist author), Xiaoshuai Sun✉, Shuting He, Jiayi Ji, Liujuan Cao, Rongrong Ji
HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation ACM International Conference on Multimedia (ACM MM), 2025 [arXiv] [Code] |
![]() |
Zhipeng Qian, Yiwei Ma (co-frist author), Zhekai Lin, Jiayi Ji, Xiawu Zheng, Xiaoshuai Sun✉, Rongrong Ji
Multi-branch Collaborative Learning Network for 3D Visual Grounding European Conference on Computer Vision (ECCV), 2024 [arXiv] [Code] |
![]() |
Sihan Liu, Yiwei Ma (co-frist author), Xiaoqing Zhang, Haowei Wang, Jiayi Ji✉, Xiaoshuai Sun, Rongrong Ji
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation Computer Vision and Pattern Recognition Conference (CVPR), 2024 [arXiv] [Code] |
![]() |
Changli Wu, Yiwei Ma (co-frist author), Qi Chen, Haowei Wang, Gen Luo, Jiayi Ji✉, Xiaoshuai Sun
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation AAAI Conference on Artificial Intelligence (AAAI), 2024 [arXiv] [Code] [PDF] |
![]() |
Zhipeng Qian, Yiwei Ma (co-frist author), Jiayi Ji, Xiaoshuai Sun ✉
X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks AAAI Conference on Artificial Intelligence (AAAI), 2024 [Code] [PDF] |
![]() |
Yiwei Ma, Guohai Xu, Xiaoshuai Sun✉, Jiayi Ji, Jie Lou, Debing Zhang, Rongrong Ji
MLLM-Selector: Necessity and Diversity-driven High-value Data Selection for Enhanced Visual Instruction Tuning arXiv preprint arXiv:2503.20502, 2025 [arXiv] |
![]() |
Weihuang Lin, Yiwei Ma (co-frist author), Jiayi Ji, Xiaoshuai Sun✉, Rongrong Ji
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning arXiv preprint arXiv:2510.08003, 2025 [arXiv] |
![]() |
Yijun Fan, Yiwei Ma (co-frist author), Jiayi Ji, Xiaoshuai Sun✉, Rongrong Ji
Any-to-3D Generation via Hybrid Diffusion Supervision arXiv preprint arXiv:2411.14715, 2024 [arXiv] |
![]() |
Yiwei Ma, Zhibin Wang, Xiaoshuai Sun✉, Weihuang Lin, Qiang Zhou, Jiayi Ji, Rongrong Ji
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model arXiv preprint arXiv:2407.16198, 2024 [arXiv] [Code] |
![]() |
External-Attention-pytorch Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers. [github ] (11400+ stars) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|