Kangsan Kim (김강산) (kangsan.kim [at] kaist [dot] ac [dot] kr), and here is my CV (Curriculum Vitae).
I am a Ph.D. student in the Graduate School of AI at KAIST (MLAI lab), fortunate to be advised by Prof. Sung Ju Hwang.
My research focuses on developing multimodal large language models (MLLMs) that understand the world and interact with humans through visual data. I have previously worked on video understanding and multimodal Retrieval-Augmented Generation (RAG). I am also interested in embodied AI models that operate on egocentric video and real-world agents such as computer use agents and coding agents.
🔥 News
- 2026.02: 🇺🇸 WorldMM got accepted to CVPR 2026, and HoliSafe got accepted to CVPR 2026 Findings!
- 2025.07:
Joined NYU as a visiting student under Prof. Mengye Ren. - 2025.05: 📖 HoliSafe is released on arXiv.
- 2025.05: 🇦🇹 VideoRAG got accepted to ACL Findings 2025.
- 2025.04: 📖 UniversalRAG is released on arXiv.
- 2025.02: 🇺🇸 VideoICL got accepted to CVPR 2025.
📝 Publications
-
MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents
[paper(openreview)] (WIP)
Kangsan Kim, Yanlai Yang, Suji Kim, Woongyeong Yeo, Youngwan Lee, Sung Ju Hwang†, Mengye Ren†
under review 2026 -
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
[project page] [paper] [code]
Woongyeong Yeo*, Kangsan Kim*, Jaehong Yoon†, Sung Ju Hwang†
Conference on Computer Vision and Pattern Recognition (CVPR) 2026 -
HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model
[project page] [paper] [code]
Youngwan Lee, Kangsan Kim, Kwanyong Park, Ilchae Jung, Sujin Jang, Seanie Lee, Yong-Ju Lee, Sung Ju Hwang
Findings of Conference on Computer Vision and Pattern Recognition (CVPR Findings) 2026 -
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
[project page] [paper] [code]
Woongyeong Yeo*, Kangsan Kim*, Soyeong Jeong, Jinheon Baek, Sung Ju Hwang
under review 2025 -
VideoRAG: Retrieval-Augmented Generation over Video Corpus
[paper] [poster] [code]
Soyeong Jeong*, Kangsan Kim*, Jinheon Baek*, Sung Ju Hwang
Findings of the Association for Computational Linguistics (ACL Findings) 2025 -
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding
[paper] [poster] [code]
Kangsan Kim*, Geon Park*, Youngwan Lee, Woongyeong Yeo, Sung Ju Hwang
Conference on Computer Vision and Pattern Recognition (CVPR) 2025
(*: equal contribution, †: equal advising)
💻 Experiences
-
Visiting Ph.D. Student, New York University
2025.07 - 2025.10, Brooklyn, NY, USA
Advisor: Prof. Mengye Ren
Studying question answering over egocentric video streams from multiple embodied agents. (MA-EgoQA) -
Computer Vision Engineer Intern, B GARAGE
2022.10 - 2023.07, San Jose, CA, USA
Developed an ultra-fast edge instance segmentation model that can segment anything in the warehouse. -
Machine Learning(NLP) Scientist Intern, NAVER
2021.07 - 2021.10, Remote
Built and improved end-to-end Korean-English speech translation model.
📖 Educations
- 2024.03 - Current, Ph.D. in Artificial Intelligence, Korea Advanced Institute of Science and Technology (KAIST).
- 2018.03 - 2024.02, B.S. in Computer Science, Korea Advanced Institute of Science and Technology (KAIST).
🏆 Honors and Awards
- 2023.06 Qualcomm-KAIST Innovation Award.
- 2020.09 Dean’s List, College of Engineering.
💬 Invited Talks
- WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
2026.01, Multimodal Weekly hosted by TwelveLabs, Online | [post] - VideoRAG: Retrieval-Augmented Generation over Video Corpus, and Beyond
2025.09, Multimodal Weekly hosted by TwelveLabs, Online | [post] - Video as a Knowledge Source: Video-based Retrieval-Augmented Generation in Diverse Scenarios
2025.09, NYU Global AI Frontier Lab, Brooklyn, NY, USA
💯 Academic Service
- Reviwer of ARR Jan 2026 and ICML 2026