Research Interests
Hisham Cholakkal’s research lies at the intersection of computer vision and multimodal learning, with a focus on foundational models and multimodal large language models (LMMs). His objective is to build omnimodal AI companions that seamlessly integrate vision, audio, speech and text across different languages and cultures, and to deploy these systems in smart wearables such as smart glasses. He is also interested in the application of multimodal large language models and AI companions to healthcare and for social good. To support this vision, his research program is structured around three interconnected pillars: multimodal learning, healthcare foundation models and advanced visual recognition architectures.
Bio
Hisham Cholakkal is an Assistant Professor at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). Prior to joining MBZUAI, he held research and technical leadership positions at the Inception Institute of Artificial Intelligence (IIAI) in the UAE, Mercedes‑Benz R&D India, BEL’s Central Research Laboratory (India) and the Advanced Digital Sciences Center in Singapore. With more than twelve years of experience in computer vision and multimodal AI, he bridges fundamental research, teaching and AI product development at scale.
As a Principal Investigator at MBZUAI, he has secured over eight research grants and awards, including the Meta Llama Impact Innovation Award (2024), NVIDIA Academic Grant (2025), Meta Regional Research Grant (2025) and Google Gift Research Award (2023). His last‑authored papers have received paper awards and recognitions, including the SAC Highlights Award at EMNLP 2025. His teaching contributions at MBZUAI have been recognized through the inaugural MBZUAI Teaching Excellence Award (2025) and the Founding Service Award. He serves in leadership roles at top AI conferences, including General Chair of ACM Multimedia Asia 2026 and Local Chair of ACCV 2028, and has held Area Chair positions at CVPR, ICLR, NeurIPS, ACM Multimedia, ECCV and BMVC. He also organizes workshops on foundational models and vision transformers at major venues including CVPR, ICCV, NeurIPS, ACCV and ICME.
Honors & Awards
- Teaching Excellence Award at MBZUAI (inaugural recipient).
- SAC Highlights Award at EMNLP 2025 for the paper MaViS: Multimodal LLM for Avian Species, co‑authored with MSc student Yevheniia and collaborators.
- Meta Regional Research Grant 2025 for Project OMER: building multimodal LLMs for smart wearables (Role: PI).
- NVIDIA Academic Grant 2025 to advance clinical reasoning capabilities of healthcare LLMs and LMMs (Role: PI).
- Google Gift Research Award 2023 (Role: PI).
- Meta Llama Impact Innovation Award 2024 for the project BiMediX2: building an Arabic‑English bilingual medical LMM (Role: PI).
- MBZUAI Founding Service Award.
- Co‑inventor of the first US patent granted to MBZUAI and several other granted US patents.
- General Chair of ACM Multimedia Asia 2026 (Abu Dhabi).
- Best Student Paper Award at VISAPP 2023 (project led by Prof. Fahad Khan).
- Second Place at the AgentX Competition hosted by UC Berkeley for the paper Agent‑X (project led by Dr. Salman Khan).
- Co‑PI for the multi‑institution collaboration project Energy GPT for Oil and Gas.
- Principal Investigator for multiple seed grants, including MBZUAI Seed Fund for Multilingual‑Health LMMs and IIT Delhi–MBZUAI Collaboration Program (Hindi–English healthcare LLM).
Services
- General Chair of ACM Multimedia Asia 2026 (Abu Dhabi).
- Area Chair at CVPR 2026, ICLR 2026, NeurIPS 2025, ACM Multimedia 2025, ECCV 2024 and BMVC 2024.
- Local Chair for ACCV 2028 (Abu Dhabi).
- Recipient of the Founding Faculty Contribution Award from the MBZUAI Computer Vision Department.
- Guest editor for the Computer Vision and Image Understanding (CVIU) Journal special issue on “Foundational Models for Pixel‑Level Scene Understanding” (2025).
- Primary organizer of workshops on vision transformers and foundation models at CVPR, ICCV, NeurIPS and ACCV, and co‑organizer at ICME.
- Associate Editor for Pattern Analysis and Applications (PAA) and IET Computer Vision.
- Grant proposal reviewer for the National Science Center, Poland.
- Chair of the Board of Examiners at MBZUAI’s Computer Vision Department and member of the procurement committee.
Selected Talks & Keynotes
- December 2025: “Building multimodal conversational assistants” at IndoML 2025 (Hyderabad, India).
- November 2025: “Advancing Multimodal Intelligence: Building AI Assistants for the Middle East” at the AI Summit 2025 (Dubai Museum of the Future).
- November 2025: “Potential of healthcare LLMs/LMMs with English–Arabic capabilities in crisis and conflict‑affected settings” at the UNESCWA Workshop (Beirut, Lebanon).
- November 2025: Roundtable on “Improving Healthcare Accessibility Using Multimodal LLMs” at the UNDP Knowledge Summit 2025 (Dubai).
- November 2025: “Building Arabic‑English Foundation Models for Healthcare” at the AI Innovation Day – Healthcare (MBZUAI, UAE).
- October 2024: “Building Multilingual Multimodal LMMs” at the Meta Open Innovation AI Research Community Annual Workshop (London).
- October 2024: “Beyond Chatbots: Building Multilingual Multimodal Conversational Assistants” at the EMAI workshop, ICIP 2024 (Abu Dhabi).
- December 2024: Lecture on “Building Multilingual Multimodal Conversational Assistants” at the SDAIA Winter School (Riyadh).
- August 2024: “How AI and Advanced Technologies Can Drive a Sustainable Future” at the Arab Youth Council for Climate Change (Abu Dhabi).
- May 2024: “Building an Arabic‑English Bilingual Healthcare LLM with Seamless Conversation Capability using Llama‑3” at the Llama Community Summit (Meta, CA).
- November 2023: “AI in Healthcare” at Tawam Hospital (Al Ain, UAE).
Positions Available
I am looking for exceptional candidates for various positions at MBZUAI, including PostDocs, Research Engineers, PhD students, MSc students and Research Interns. PostDoc applicants should have strong experience in multimodal models and LLM/VLMs. Research engineers must demonstrate strong development skills through past projects; a background in LLM/VLM and generative AI for healthcare is a plus. Research Interns and PhD candidates should possess a strong academic background in BS/MS programs, focusing on computer vision/machine learning, with relevant coursework and projects. Previous publications in top conferences such as CVPR, ECCV, ICCV, NeurIPS, ICLR or ICML are desirable. If you are interested, please send your CV and GitHub profile link to me.
News
- MediX‑R1 Released: Learn more in the news article and on the project website.
- BiMediX Webinar: Watch the recording on YouTube.
Teaching
Hisham Cholakkal teaches core and advanced computer vision courses at MBZUAI, recognized through both the inaugural Teaching Excellence Award and the Computer Vision Department Teaching and Mentorship Award (2026). Below are brief summaries of his courses.
Human and Computer Vision (CV701)
This course covers fundamental image processing concepts (filtering, edge detection, color planes), classical computer vision methods (corner and blob detection, SIFT), camera geometry and optics (intrinsic and extrinsic parameters, stereo matching and 3D vision), machine learning and deep learning for vision and advanced topics such as human action recognition and pose estimation.
Offered: Spring 2021, Fall 2021, Fall 2022.
Visual Recognition and Detection (CV703)
This course explores fundamentals of visual recognition; CNN‑based and transformer‑based architectures; single‑stage and two‑stage object detection; segmentation architectures; video tracking and cutting‑edge detection frameworks.
Offered: Fall 2021, Spring 2022, Spring 2023, Spring 2024, Spring 2025, Spring 2026.
Advanced Computer Vision (CV801)
A two‑part course for first‑semester PhD students. Part I establishes foundations in convolutional and transformer architectures for detection and segmentation. Part II introduces advanced topics, including foundation models, vision–language models, large multimodal and language models, the Segment Anything Model, efficient computer vision, remote sensing change detection and diffusion‑based image generation.
Offered: Fall 2023, Fall 2024, Fall 2025.
Team (Primary Supervision)
I have had the privilege of collaborating with exceptional students, colleagues, and collaborators. Below is a partial list:
Postdoctoral Researchers
- Dr. Jean Lahoud
- Dr. Jinxing Zhou
Researchers Engineers
- Sambal Shikar
- Sandesh Venkatesh Bharadwaj
- Jaseel Muhammad
Current Students (Primary Supervision)
- Sahal Shaji Mullappilly (PhD, MBZUAI, 2023– ; MSc, MBZUAI, 2021–2023)
- Jose Renato Restom Viera (PhD, MBZUAI, 2023– ; MSc, MBZUAI, 2021–2023)
- Mohammad Ahmad Eid Mohamed Almansoori (PhD, MBZUAI, 2023–)
- Komal Kumar (PhD student, MBZUAI, 2024–)
- Ameera Ali Bawazir (PhD student, MBZUAI, 2024–)
- Mohammed Irfan K (PhD student, MBZUAI, 2025–)
- Beknur Kalmakhanbet (MSc student, MBZUAI, 2024–)
Alumni
- Dr. Daniya Abdul Kareem (PhD, MBZUAI, 2021–2025; now with TII, Abu Dhabi)
- Yevheniia Kryklyvets (MSc, MBZUAI, 2023–2025; now with G42, UAE)
- Dr. Mustansar Fiaz (Postdoc, MBZUAI; now with IBM Research)
- Dr. Nian Liu
- Mohammad Khaled Almansoori (MSc, MBZUAI, 2021–2023; now with Abu Dhabi Police)
- Yahia Dalbah (MSc, MBZUAI, 2021–2023; now with SAAB, UAE)
- Sara Pieri (MSc, MBZUAI, 2021–2023; joined PhD at INRIA, France)
- Abhishek Singh Gehlot (MSc, MBZUAI, 2021–2023; now with Shinobi Security, UAE)
- Dhanalaxmi Gaddam (MSc, MBZUAI, 2021–2023; now with DP World, UAE)
- Amrin Kareem (MSc, MBZUAI, 2022–2024)
- Shamma Sultan Saeed Alsaedi (MSc, MBZUAI, 2022–2024)
- Ankan Kumar (RA, MBZUAI, 2020–2023; now with University of Edinburgh, UK)
- Amandeep Kumar (RA, MBZUAI, 2022–2024; now with Johns Hopkins University, USA)
- Guolei Sun (Software Engineer at IIAI; now with ETH Zurich)
- Dr. Jiale Cao (Visiting Postdoc at IIAI; now with Tianjin University, China)
Previous Roles
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi: Research Scientist (Computer Vision), 2018–2020.
- Mercedes‑Benz R&D India: Senior Technical Lead, Computer Vision Research Team, 2016–2018.
- Central Research Laboratory – BEL, India: Member Research Staff, 2009–2011.
- Advanced Digital Sciences Center, Singapore: Senior Engineer (Computer Vision & GPU/FPGA Efficiency), 2011–2012.
Education
Ph.D. in Computer Engineering, Nanyang Technological University (NTU), Singapore (2016). Research focused on visual attention, visual recognition and learning with limited supervision.
M.Tech. in Digital Signal Processing, Indian Institute of Technology (IIT) Guwahati, India (2009). Research focused on video compression.
Research Demonstrations, Public Engagement & Media Coverage
Our research outputs are frequently showcased as live demonstrations at major international events, widely covered in the media and shared with the public to inspire the next generation of AI practitioners.
Selected Publications
Professor Hisham Cholakkal has published over 100 research papers and holds more than eight granted U.S. patents across three primary research pillars: multimodal learning, healthcare foundation models and efficient visual recognition architectures. Representative research publications are listed below. A complete and continuously updated publication list is available on Google Scholar.