-
DocHop: Benchmarking Out-of-domain Multi-hop Reasoning in Information-Dense Documents
Zhuoran Yu, Le Thien Phuc Nguyen, Jaden Park, Xinyi Gu, Zexue He, Soochahn Lee, Rogerio Feris, Yong Jae Lee
International Conference on Machine Learning (ICML), 2026
[arxiv]
[code]
[project page]
-
Large Language Model Teaches Visual Students: Cross-Modality Transfer of Fine-Grained Conceptual Knowledge
Thomas Shih-Chao Liang*, Zhuoran Yu*, Yong Jae Lee (* equal contribution)
International Conference on Machine Learning (ICML), 2026
[arxiv]
[code]
-
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
Le Thien Phuc Nguyen*, Zhuoran Yu*, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae Lee (* equal contribution)
Findings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR Findings), 2026
[arxiv]
[code]
[project page]
-
Revisiting Active Speaker Detection: An In-the-Wild Benchmark for Generalization and Robustness
Le Thien Phuc Nguyen*, Zhuoran Yu*, Khoa Quang Nhat Cao, Yuwei Guo, Tu Ho Manh Pham, Tuan Tai Nguyen, Toan Ngo Duc Vo, Lucas Poon, Tuan Khai Nguyen, Soochahn Lee, Yong Jae Lee (* equal contribution)
INTERSPEECH, 2026
[arxiv]
[code]
[project page]
-
Composition-Grounded Instruction Synthesis for Visual Reasoning
Xinyi Gu, Jiayuan Mao, Zhang-Wei Hong, Zhuoran Yu, Pengyuan Li, Dhiraj Joshi, Rogerio Feris, Zexue He
International Conference on Learning Representations (ICLR), 2026
[arxiv]
[code]
[project page]
-
DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
Brandon Huang, Hang Hua, Zhuoran Yu, Trevor Darrell, Rogerio Feris, Roei Herzig
International Conference on Learning Representations (ICLR), 2026
[arxiv]
-
How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding
Zhuoran Yu, Yong Jae Lee
Conference on Language Modeling (COLM), 2025
[arxiv]
-
LASER: Lip Landmark Assisted Speaker Detection for Robustness
Le Thien Phuc Nguyen*, Zhuoran Yu*, Yong Jae Lee (* equal contribution)
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026 (Oral)
[arxiv]
[code]
-
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
Aniket Rege, Zinnia Nie, Mahesh Ramesh, Unmesh Raskar, Zhuoran Yu, Aditya Kusupati*, Yong Jae Lee*, and Ramya Korlakai Vinayak* (* equal advising)
IEEE/CVF International Conference on Computer Vision (ICCV), 2025
[arxiv]
[code]
-
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
Zhuoran Yu, Chenchen Zhu, Sean Culatana, Raghuraman Krishnamoorthi, Fanyi Xiao*, Yong Jae Lee* (* equal advising)
Transactions on Machine Learning Research (TMLR), 2025 (Featured Certificate)
[arxiv]
[openreview]
-
Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation
Zhuoran Yu*, Manchen Wang*, Yanbei Chen, Paolo Favaro, Davide Modolo
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024
[arxiv]
-
InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning
Zhuoran Yu, Yin Li, Yong Jae Lee
International Conference on Learning Representations (ICLR), 2023
[arxiv]
[code]
-
Group R-CNN for Weakly Semi-supervised Object Detection with Points
Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou, Kai Chen (* equal contribution)
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[arxiv]
[code]
-
Scale-Equalizing Pyramid Convolution for Object Detection
Xinjiang Wang*, Shilong Zhang*, Zhuoran Yu, Litong Feng, Wayne Zhang (* equal contribution)
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
[arxiv]
[code]