I am open to research collaborations, partnerships, consultancy, and joint projects with industry.

    I am also looking for exceptional candidates for various positions at MBZUAI: PostDocs, Research Engineers, PhD students (for Fall 2025 intake), and Research Interns. If you are interested, please send your CV and GitHub profile link to me.

    Contact me: hisham.cholakkal(at)mbzuai.ac.ae

    H i s h a m   C h o l a k k a l
    Building Machines that can See

    ABOUT ME
    I am an Assistant Professor at MBZUAI with several years of diverse experience in AI and computer vision research. My background spans fundamental research, teaching, and product development in the industry, and I have experience in leading research teams in both fundamental research and commercial product development. ​

    PREVIOUS ROLES

    • Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, UAE: Research Scientist (Computer Vision)​
    • Mercedes-Benz R&D, India: Senior Technical Lead, Computer Vision Research Team​
    • Central Research Lab-BEL, India (Development of Efficient Computer Vision Algorithms)​
    • Advanced Digital Science Center, Singapore​ (Computer Vision, GPU/FPGA Efficiency)

    EDUCATION

    • PhD: Nanyang Technological University (NTU), Singapore
    • Masters (M.Tech): Indian Institute of Technology Guwahati (IIT Guwahati), India

    RECOGNITIONS

    • I serve as an Area Chair for top computer vision conferences such as ECCV 2024 and BMVC 2024
    •  I have organized workshops at top conferences such as CVPR 2024, ICCV 2023, NeurIPS 2022, and ACCV 2022.
    • Program committee member for several top computer vision conferences, including CVPR, NeurIPS, and ICLR, and I review for journals like IEEE TPAMI and IJCV.
    • Associate Editor for the IET Computer Vision journal.

    RESEARCH INTEREST

    My research areas include:

    • Multimodal models, LLMs/VLMs,
    • Visual recognition
    • AI in healthcare

    My recent research focus is on building multimodal conversational agents that can reason about and interact seamlessly with humans in real time. Additionally, I am interested in the real-world applications of computer vision and machine learning algorithms in healthcare and remote sensing.

    RESEARCH FUNDS/GRANTS

    • Google Research Award, MBZUAI, 2023-2024: Sustainability tailored Arabic LLM (Role: PI)
    • Weizmann Institute of Science -MBZUAI Joint Research Grant: Cell Segmentation and Lineage tracing for Mouse Embryos, 2022-2025 (Role: PI)
    • MBZUAI Seed fund for Multilingual Health LMM (Role: PI) MBZUAI Start-up fund: Efficient and Robust Deep learning Architectures for Comprehensive Scene Understanding (Rol: PI)

    OPEN POSITIONS

    I am looking for exceptional candidates for various positions at MBZUAI: PostDocs, Research Engineers, PhD students (for fall 2025 intake), and Research Interns. PostDoc applicants should have strong experience in Multimodal Models and LLM/VLMs. Research engineers must demonstrate strong development skills through past projects (a background in LLM/VLM and Generative AI for healthcare is a plus). Research Interns and PhD candidates should possess a strong academic background in BS/MS programs, focusing on computer vision/machine learning, with relevant coursework and projects. Previous publications in top conferences such as CVPR, ECCV, ICCV, NeurIPS, ICLR, or ICML are desirable. If you are interested, please send your CV and GitHub profile link to me

    LATEST NEWS

    • Bimedix Webinar Recording  is now  available on  YouTube
    • Four papers accepted at ECCV 2024! Congratulations to all students and collaborators.
    • Serving as Area Chair at BMVC 2024
    • Serving as Area Chair at ECCV 2024
    • Joined IET Computer Vision Editorial Board as an Associate Editor
    • Received Google Research Award-MBZUAI
    • Papers Acceptance in 2024: Several Papers accepted at CVPR 2024, ICML 2024, ICRA2024, WACV, MICCAI2024. Congratulations to the team.
    • Invited talk at the Llama Community Summit, CA, “Building an Arabic-English Bilingual Healthcare LLM with Seamless Conversation Capability using Llama-3”
    • Successfully Organized Workshop on Foundation Models. Congratulations to the team
    • Received the FIRST US patent of MBZUAI. Congratulations to the co-inventors
    • Received Seed fund for developing Multilingual-Health LLM
    • MBZUAI Commencement 2024: Four MSc students graduated. Congratulations Hosam, Amrin, Fazli and Shamma!
     

    Team

    TEAM


    I have had the privilege of collaborating with exceptional students, colleagues, and collaborators. Below is a partial list:

    Postdoctoral Researcher

    • Jean Lahoud
    • Nian Liu

    Current Students

    1. Sahal Shaji Sahal Shaji Mullappilly (PhD@MBZUAI, 2023- , MSc @MBZUAI, 2021-2023)

    2. Jose Renato Restom Viera (PhD@MBZUAI, 2023- , MSc @MBZUAI, 2021-2023)

    3. Daniya Abdul Kareem (PhD@MBZUAI, 2021- -)

    4. Mohammad Ahmad Eid Mohamed Almansoori (PhD@MBZUAI, 2023- )

    5. Yevheniia Kryklyvets (MSc@MBZUAI, 2023- )

    6. Fawaghy Ahmed Saeed Mohamed Alshahmy (MSc@MBZUAI, 2023- )

    7. Tooba Tahreem Sheikh (MSc@MBZUAI, 2023-)

    8. Amrin Kareem (MSc@MBZUAI, 2022- 2024)

    9. Mohammed Fazli Imam (MSc@MBZUAI, 2022- 2024)

    10. Hosam Mahmoud Engendy (MSc@MBZUAI, 2022- 2024)

    11. Shamma Sultan Saeed Alsaedi (MSc@MBZUAI, 2022- 2024)

    Alumni

    1. Mustansar Fiaz ( Postdoc@ MBZUAI, Now with IBM Research)

    2. Mohammad Khaled Almansoori (MSc@MBZUAI 2021-2023, Now with Abu Dhabi Police,)

    3. Yahia Dalbah ( MSc@MBZUAI 2021-2023, Now with SAAB, UAE)

    4. Sara Pieri (MSc@MBZUAI 2021-2023, Joined PhD at INRIA , France)

    5. Abhishek Singh Gehlot (MSc@MBZUAI 2021-2023, Now with Shinobi Security, UAE)

    6. Aidana (MSc@MBZUAI 2021-2023, Now with Abu Dhabi University, UAE)

    7. Dhanalaxmi Gaddam (MSc@MBZUAI 2021-2023, Now with DP World, UAE)

    8. Ankan Kumar (RA @MBZUAI 2020-2023, Now with University of Edinburgh, UK)

    9. Amandeep Kumar (RA @MBZUAI 2022-2024, Now with John Hopkins, USA)

    10. Guolei Sun (SE at IIAI, Now with ETH Zurich)

    11. Jiale Cao (Visiting Postdoc at IIAI, now with Tianjin University, China)

    RESEARCH

    RESEARCH

    Representative Research Projects

    I. Foundation Models/ Large Multimodal Models (LMM) with Reasoning Capability

    My recent research focus on developing LLMs and LMMs for different applications.  Few example projects are listed below:  

    1. PARIS3D -ECCV 2024

     

     

    This recent work published at ECCV 2024 is motivated by the need for integrating reasoning and grounding capabilities for intelligent perception systems, with potential for various applications such as robotics. For instance, an intelligent system should know where to hold a ‘kettle’ or which part of a ‘bottle’ to open in a fine-grained manner without the user explicitly naming the object or its parts. In this project, we introduced a novel problem setting called reasoning-based part segmentation/grounding on 3D point clouds. This task involves generating a part segmentation mask for a 3D object based on implicit textual queries that require complex reasoning. Our PARIS3D model leverages its reasoning capabilities to
    (i) understand implicit instructions, such as referencing properties of an object or its parts without explicit articulation, and
    (ii) explain or justify its responses, whether in generated text or predicted segmentation masks. To support future research on this important problem, we have introduced and open-sourced an evaluation benchmark and a dataset named RPSeg3D, comprising 2624 3D objects and over 60k instructions with part segmentation mask.
      

    [Papr Link][Project Page/Code] 

    2. GlaMM-CVPR 2024

     

    Many existing open-source and closed-source large multimodal models lack explicit grounding capability to accurately segment referred object regions. To address this limitation, we introduced the first model that can seamlessly output text responses intertwined with pixel-precise grounding segmentation masks by taking textual and optional visual prompts as input. The unique structure of this model supports a wide range of tasks, including grounded conversation generation (GCC), referring expression segmentation, image and region-level captioning, and vision-language conversations. To facilitate future research on this challenging problem, we propose a densely annotated Grounding-anything Dataset (GranD) that encompasses 7.5M unique concepts grounded in a total of 810M regions with segmentation masks. 

    [Paper], [Project Page/Code

    3. PALO: Multilingual Visual Reasoning

     

    This project introduces an inclusive Large Multilingual Multimodal Model called Palo. Palo offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese, that span a total of ~5B people (65% of the world population).  

    [Paper][Project Page/Code]

    4. MobiLlama- Multimodal Models on Mobile phones

     

    This project explores the challenge of designing accurate yet efficient Small Language Models (SLMs) for on-device processing on resource-constrained mobile devices. Our focus is to develop fully transparent, open-source, mobile-friendly LLMs with enhanced energy efficiency, a low memory footprint, and high response efficiency

    [Paper] [Project Page/ Code] 

    5. Arabic Mini-climate GPT, EMNLP Findings 2023 

    Climate change is one of the most significant challenges we face together as a society. Creating awareness and educating policy makers the wide-ranging impact of climate change is an essential step towards a sustainable future. We propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning curated Arabic dataset Clima500-Instruct with over 500k instructions about climate change and sustainability.This project was later extended to Jais-Cliamte and Presented at COP28 

    [Paper] [Project Page

    II. AI in Healthcare

    1. Generative AI in Healthcare

           (i)  BiMediX  

              [Paper] [Project Page] [HF Demo] [Telegram Live Demo]

    In this project, we introduce the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic. Our model facilitates a wide range of medical interactions in English and Arabic, including multi-turn chats to inquire about additional details such as patient symptoms and medical history, multiple-choice question answering, and open-ended question answering. This model achieve state-of-the-art performance on multiple Arabic and English medical LLM evaluation benchmarks.

    Our contributions are as follows:

    1. * We introduce BiMediX, the first bilingual medical LLM with expertise in both English and Arabic, enabling seamless medical interactions such as multi-turn chats, multiple choice, and closed question answering.
    2.  
    3. * We developed a semi-automated translation pipeline with human verification for high-quality translation of English medical texts into Arabic, aiding in the creation of a dataset and benchmark for Arabic healthcare LLM evaluation.
    4.  
    5. We curated the BiMed1.3M dataset, a comprehensive Arabic-English bilingual instruction set with over 1.3 million instructions and 632 million healthcare-specialized tokens, supporting diverse medical interactions and enabling a chatbot for patient follow-ups, with a focus on a 1:2 Arabic-to-English ratio across medical content.
     
    1. * BiMediX outperforms existing models in medical benchmarks while being 8-times faster than comparable existing approaches.
     

    (ii) XrayGPT

     

    In this project, we develop a conversational agent that can understand X-ray images and respond to user queries related to the X-ray images through multi-turn conversations. 
     
     
     

    (iii)Few-Shot Medical Image Generation, MICCAI 2023

    In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues. Our few-shot generation method, named XM-GAN, takes one base and a pair of reference tissue images as input and generates high-quality yet diverse images 

    [Paper] [Project Page/Code] 

    2. Medical Image Analysis

    In this project, we develop novel segmentation approaches for 2D and 3D medical and microsopic images.  

    Selected Publications 
     1. Semi-supervised Medical Image Segmentation, MICCAI 2024,

       [Paper], [Project Page/Code] 

      2. Microscopic Image segmentation, BMVC 2023 (Oral) 

          [Paper] [Project Page] 

      3. 3D Medical Image Segmentation, IEEE TAI, 2024 

          The proposed vMixer framework exploits explicit local and global volumetric features to better learn the shape-boundary details of the organs. We also provide an extensive study on the selection of architectural design that is adapted for 3-D medical segmentation from 2-D vision literature for better boundary localization. Finally, we exploit the transfer learning capabilities of the proposed vMixer where training data are limited. 

         [Paper], [Project Page/Code

       4. 3D Medical Image Segmentation, ISBI 2024 (Oral),

          [Paper], [Project Page/Code] 

       Here we propose a Directional Window Attention based method for medical image segmentation 

     

    III. Visual Recognition Architectures

    This project focus on developing novel architectuers for different visual recogntion applications such as image classification, object detection, person search, remote sensing change detection, and instance segmentation on images, videos, radar data  and 3D point clouds.

    Representative Publications: 

    1. High quality object detection and instance segmentation, CVPR 2020.

    [Paper][Project Page/Code] 

    2. Real-time Instace Segmentation on Images and Videos, ECCV 2020

    [Paper] [Project Page/Code

    3. Video Instance Segmentation, ECCV 2022 

    [Paper] [Project Page/Code]  

    4. Computationally Efficient Architectures for Mobile devices, ECCV Workshop, 2022 

    [Paper] [Project Page/Code] 

    5. Radar Object Detection, WACV 2023 

    [Paper] [Project Page/Code

    5. Person search 

    CVPR 2022 [Code]    WACV 2024 Paper  [Code

    6. Visual Recognition Architectures for Federated Learning, NeurIPS 2023 

    [Paper] [Code] 

    IV. Open World Learning and Learning with Limited (Zero-shot/Few-shot) Supervision

    The objective of this project is to investigate novel learning approaches for practical deployment of visual recognition approaches,  such as open-world learning, and learning with limited supervision (eg. Few-shot learning, zero-shot learning).  

    1. Semi-supervised Open-World Object Detection in Natural Images and Satellite Imagery AAAI 2024 

    [Paper] [Project Page/Code] 

    2. Open-world 3D Indoor Instance Segmentation, NeurIPS 2023 

    [Paper] [Project Page/Code] 

    3. Open-world Video Instance Segmentation, IJCV 2024 

    [Paper] [Project Page

    4. Few-shot Semantic Segmentation, ICML 2024 

    [Paper] [Project Page

    5. Object Counting with Image-level Supervision, CVPR 2019 

    [Paper] [Project Page/Code

    PUBLICATIONS

    PUBLICATIONS

    For full list, visit: https://scholar.google.com/citations?user=bZ3YBRcAAAAJ&hl=en

    • 1. Amrin Kareem, Jean Lahoud, Hisham Cholakkal
      ”PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model”
      in ECCV, 2024
    • [Paper Link] [Project Page/Code]
    •  
    • 2. Ankit Das, Chandan Gautam, Hisham Cholakkal , Pritee Agrawal, Feng Yang, Ramasamy Savitha and Yong Liu
      ”Decoupled Training for Semi-supervised Medical Image Segmentation with Worst-Case-Aware Learning”
      in MICCAI 2024
    •  
    • 3. Sahal Shaji Mullappilly, Abhishek Singh Gehlot, Rao Muhammad Anwer, Fahad Shahbaz Khan, Hisham Cholakkal
      “Semi-supervised Open-World Object Detection”
      in AAAI 2024
    • [Paper] [Project Page/Code]
     
    • 4. S Pieri, JR Restom, S Horvath, Hisham Cholakkal
      “Handling Data Heterogeneity via Architectural Design for Federated Visual Recognition” in NeurIPS, 2023
      [Paper] [Project Page/Code]
    •  
    • 5. Sara Pieri, Sahal Shaji Mullappilly, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman H. Khan, Timothy Baldwin,Hisham Cholakkal
    • BiMediX: Bilingual Medical Mixture of Experts LLM,” arXiv: 2402.13253, 2024.
      [Paper] [Project Page] [HF demo] [Telegram Live Demo]
    •  
    • 6. Omkar Thawkar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen and Fahad Shahbaz Khan,
      ”XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models”, in arXiv: 2306.07971, 2023
      [Paper] [Project Page/Code]
    •  
    • 7. Mustansar Fiaz, Moein Heidari, Rao Muhammad Anwer, Hisham Cholakkal
      ”SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation”, in BMVC (Oral), 2023
      [Paper] [Project Page/Code]
    •  
    • 8. Daniya Kareem, Mustansar Fiaz, Noa Novershtern, J Hanna, Hisham Cholakkal
      “Improving 3D Medical Image Segmentation at Boundary Regions using Local Self-attention and Global Volume Mixing”, in IEEE Transactions on Artificial Intelligence (TAI), 2023.
      [Paper] [Project Page/Code]
    •  
    • 9. Daniya Kareem, Mustansar Fiaz, Noa Novershtern, Hisham Cholakkal
      “Medical Image Segmentation Using Directional Window Attention,”, in ISBI, 2024 (Oral)
      [Paper] [Project Page/Code]
    •  
    • 10. Amandeep Kumar, Sanath Narayan, Hisham Cholakkal , Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan,
      ”Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification” , in MICCAI, 2023
      [Paper] [Project Page/Code]
    •  
    • 11. Sahal Shaji Mullappilly, Abdelrahman Shaker, Omkar Thawakar, Hisham Cholakkal , Rao Anwer, Salman Khan, and Fahad Khan,
      ”Arabic Mini-ClimateGPT: A Climate Change and Sustainability Tailored Arabic LLM” , in EMNLP Findings, 2023 (extended as Jais-Climate for COP 28)
      [Paper] [Project Page/Code]
    •  
    • 12. Mubashir Noman, Mustansar Fiaz,Hisham Cholakkal ,
      ”ChangeBind: A Hybrid Change Encoder for Remote Sensing Change Detection” , in IGARSS, 2024
    •  
    • 13. Hisham Cholakkal* , G. Sun* , S. Khan, FS. Khan, L. Shao, L.V Gool,
      ”Towards Partial Supervision for Generic Object Counting in Natural Scenes” , in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
    •  
    • 14. Sanath Narayan,Hisham Cholakkal , Munawar Hayat, Fahad Khan, Ming-Hsuan Yang, and Ling Shao,
      “D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations,” , in ICCV, 2021, , pp. 13588-13597.
    •  
    • 15. MKhaleed Almansoori, M Fiaz,Hisham Cholakkal ,
      ”DDAM-PS: Diligent Domain Adaptive Mixer for Person Search” , in WACV, 2024.
    •  
    • 16. Mustansar Fiaz,Hisham Cholakkal ,, Rao Muhammad Anwer, Fahad Shahbaz Khan,
      “SAT: Scale-Augmented Transformer for Person Search,”, in WACV, 2023, pp. 4809-4818.
    •  
    • 17. Mustansar Fiaz,Hisham Cholakkal , Sanath Narayan, Rao Muhammad Anwer, Fahad Shahbaz Khan,
      “PS-ARM: An End-to-End Attention-Aware Relation Mixer Network for Person Search,” in ACCV 2022, pp. 234-250.
    •  
    • 18. Y Dalbah, J Lahoud,Hisham Cholakkal ,
      ”TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation” in WACV, 2024,
    •  
    • 19. Yahia Dalbah, Jean Lahoud,Hisham Cholakkal ,
      “RadarFormer: Lightweight and Accurate Real-Time Radar Object Detection Model,” in SCIA 2023 , pp. 341-358.
    •  
    • 20. Aidana Nurakhmetova, Jean Lahoud, Hisham Cholakkal ,
      “Data-Efficient Transformer-Based 3D Object Detection,” in VISIGRAPP 2023, , pp. 615-623
    •  
    • 21. Dhanalaxmi Gaddam, Jean Lahoud, Fahad Shahbaz Khan, Rao Muhammad Anwer, Hisham Cholakkal ,
      “CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection,” in ACM Multimedia Asia 2022 , pp. 27:1-27:8.
    •  
    • 22. AmandeepKumar,MuzammalNaseer, S.Narayan, RaoAnwer, SalmanKhan, Hisham Cholakkal ,
      “Multi-modal Generation via Cross-Modal In-Context Learning,” in arXiv:2405.18304, 2024
    •  
    • 23. Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal ,Gui-Song Xia, Fahad Shahbaz Khan
      ”Transformers in remote sensing: A survey,” in Remote Sensing , 2023
    •  
    • 24. LongLiJunweiHan, DingwenZhang, ZhongyuLi, SalmanKhan, RaoAnwer, Hisham Cholakkal ,Nian Liu, Fahad Shahbaz Khan,
      ”CONDA: Condensed Deep Association Learning for Co-Salient Object Detection” in ECCV, 2024
    •  
    • 25. Mohamed El Amine Boudjoghra, Jean Lahoud, Salman Khan, Hisham Cholakkal , Rao Anwer, Fahad Shahbaz Khan,
      “OpenDistill3D: Open-World 3D Instance Segmentation with Unified SelfDistillation for Continual Learning and Unknown Class Discovery,” in ECCV 2024
    •  
    • 26. Amandeep Kumar, Muhammad Awais, Sanath Narayan,Hisham Cholakkal , Salman Khan, Rao MuhammadAnwer,
      , ”Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning” in ECCV 2024
    •  
    • 27. Mubashir Noman, Muzammal Naseer, Hisham Cholakkal , Rao Muhammad Anwar, Salman Khan, Fahad Shahbaz Khan,
      , ”Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery” in CVPR, 2024
    •  
    • 28. HanoonaRasheed, MuhammadMaaz, SahalShaji, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal , Rao M Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S Khan,
      ”GLaMM: Pixel Grounding Large Multimodal Model” in CVPR, 2024.
    •  
    • 29. Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal , Rao M Anwer, Tim Baldwin, Michael Felsberg, Fahad S Khan,
      ”PALO: A Polyglot Large Multimodal Model for 5B People”, in arXiv:2402.14818, 2024.
    •  
    • 30. Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakkal , Rao M Anwer, Michael Felsberg, Tim Baldwin, Eric P Xing, Fahad Shahbaz Khan,
      ”MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT” in arXiv:2402.16840, 2024
    •  
    • 31. Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal , Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan,
      ”Remote sensing change detection with transformers trained from scratch” in IEEE Transactions on Geoscience and Remote Sensing, 2024.
    •  
    • 32. Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal , Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan,
      ”Generative Multiplane Neural Radiance for 3D-Aware Image Generation” in ICCV, 2023.
    •  
    • 33. Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal , Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan,
      ”Person image synthesis via denoising diffusion model” in CVPR, 2023.
    •  
    • 34. Jiale Cao, Yanwei Pang, Rao Muhammad Anwer, Hisham Cholakkal , Jin Xie, Mubarak Shah, Fahad Shahbaz Khan,
      “PSTR: End-to-End One-Step Person Search With Transformers,” in CVPR, 2022,pp. 9448-9457.
    •  
    • 35. Jin Xie, Hisham Cholakkal ,Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Mubarak Shah,
      “Count- and Similarity-Aware R-CNN for Pedestrian Detection,” in ECCV, 2020,pp. 88-104.
    •  
    • 36. Omkar Thawakar, Sanath Narayan, Jiale Cao,Hisham Cholakkal , Rao Muhammad Anwer, MuhammadHaris Khan, Salman Khan, Michael Felsberg, Fahad Shahbaz Khan,
      “Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer,” in ECCV, 2022,pp. 666-681.
    •  
    • 37. JCao, RM Anwer, Hisham Cholakkal , FS Khan, Y Pang, L Shao
      “SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation,” in ECCV, 2020
    •  
    • 38. Mohamed El Amine Boudjoghra, Angela Dai, Jean Lahoud, Hisham Cholakkal , Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan,
      “Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation,” in arXiv:2406.02548 [cs.CV]  [Project Page/Code]
    •  
    • 39. Jean Lahoud, Fahad Khan, Hisham Cholakkal , Rao Anwer, Salman Khan,
      “Long-Tailed 3D Semantic Segmentation with Adaptive Weight Constraint and Sampling,” in ICRA, 2024
    •  
    • 40. Mohamed El Amine Boudjoghra, Salwa K. Al Khatib, Jean Lahoud, Hisham Cholakkal , Rao Muhammad Anwer, Salman H. Khan, Fahad Shahbaz Khan,
      “3D Indoor Instance Segmentation in an Open-World,” in NeurIPS, 2023
    •  
    • 41. Omkar Thawakar, Ashmal Vayani, Salman H. Khan, Hisham Cholakkal , Rao Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Khan,
      “MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT,” [Paper]  [Project Page/Code]
    •  
    • 42. Muhammad Awais, Muzammal Naseer, Salman H. Khan, Rao Muhammad Anwer, Hisham Cholakkal , Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan,
      “Foundational Models Defining a New Era in Vision: A Survey and Outlook,” ” under review of IEEE TPAMI arXive: 2307.13721 (2023). [Project Page/Code]

    SELECTED PATENTS

     

    • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Anwer, Fahad Khan, ”System and method for handwriting generation”, US Patent Granted, Num.:11756244, 2023.

    • Hisham Cholakkal, G. Sun, FS Khan and L. Shao, ”Object counting and instance segmentation using neural network architectures with image-level supervision”, US Patent Granted.

    • Hisham Cholakkal, J. Cao, R.M Anwer, FS Khan, Y. Pang, L. Shao, ”Dense and Discriminative Neural Network Architectures for Improved Object Detection and Instance Segmentation”, US Patent Granted, Num.:11244188, 2022.

    • Hisham Cholakkal, Sanath Narayan, Arjun Jain, Shuaib Ahmed, Amit Bhatkal, M.B.R Reddy, Apurbaa Mallik, ”Method for identifying a hand pose in a vehicle”, US Patent Granted, Number 11887396, 2024.

    • M Maaz, A Shaker, Hisham Cholakkal, S Khan, SW Zamir, RM Anwer, FS Khan, ”Sys-
    tem and method for efficiently amalgamated CNN-transformer architecture for Mobile Visiob Applications”, US Patent App.18/078,657, 2024.

    • Amandeep Kumar, Ankan Kumar Bhunia, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer, Fahad Khan, ”System and method of cross-modulated dense local fusion for few-shot image generation”, US Patent App. Number 17983952, 2024.

    • Mustansar Fiaz, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer, Fahad Khan, ”System and method for attention-aware relation mixer for person search”, US Patent App. Number 17983741, 2023.

    • Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao M. Anwer, Muhammad Haris, Salman Khan, Fahad Khan, ”System and method for video instance segmentation via multi-scale spatio-temporal split attention transformer”, US Patent App. Number 17983841, 202

    RESEARCH PAPERS
    Explore my reearch papers, where each entry reflects my dedication to in-depth research and a profound passion for knowledge
    JUNE 2007

    Designing and testing an intelligent virtual assistant to aid in customer service interactions.

    In the realm of customer service, our research papers delve into the creation and testing of an intelligent virtual assistant. The initial phase illuminates the meticulous design process, integrating advanced algorithms and user-centric principles. This user interface-focused exploration ensures not only technological sophistication but also a seamless and satisfying interaction for end-users.

    Moving forward, our papers unveil the rigorous testing procedures applied to evaluate the virtual assistant's efficacy and reliability. From simulated scenarios to real-world applications, this research offers a comprehensive perspective on the transformative potential of intelligent virtual assistants in revolutionizing and elevating customer service experiences.

    MARCH 2009

    The role of technology in education and its effects on student learning outcomes

    Within the educational landscape, our research endeavors to unravel the multifaceted role of technology in shaping modern learning experiences. The first segment scrutinizes the integration of technology in educational settings, examining its influence on pedagogical approaches and classroom dynamics. By exploring the synergies between traditional teaching methods and technological innovations, we aim to shed light on the evolving nature of education in the digital age.

    Transitioning to the second phase, our research meticulously assesses the impact of technology on student learning outcomes. Through comprehensive analysis and empirical studies, we aim to delineate the nuanced effects technology has on cognitive development, academic achievement, and overall educational attainment. Join us in this exploration of how technology is not merely a tool but a transformative force, redefining the very essence of learning and paving the way for a technologically enriched educational future.

    JANUARY 2010

    Exploring the use of artificial intelligence and machine learning in fraud detection and prevention.

    Embark on a journey through the intricate landscape of fraud detection and prevention with our research papers, as we delve into the transformative potential of artificial intelligence (AI) and machine learning. The first segment scrutinizes the foundational principles of AI and machine learning algorithms, revealing their capacity to discern patterns and anomalies within vast datasets. Unveiling the synergistic alliance between technology and the fight against fraud, our exploration underscores the dynamic capabilities that AI brings to the forefront of security strategies.

    As we navigate deeper into the realm of fraud prevention, the subsequent papers unravel the practical applications of AI and machine learning in real-world scenarios. From adaptive fraud models to predictive analytics, our research showcases the efficacy of these technologies in staying one step ahead of evolving fraudulent tactics. Join us in deciphering how AI and machine learning stand as powerful allies in the ongoing battle against fraud, reshaping the landscape of security protocols with their proactive and adaptive capabilities.

    LABORATORY TEAM

    JOHN DOE

    RESEARCH ASSISTANT

    JENNIFER DOE

    ASSOCIATE PROFESSOR

    JOHNATAN DOE

    RESEARCH FELLOW

    TEACHING

    TEACHING

    1. Human and Computer Vision  (CV701)

    Course Contents:  The course content includes the following: (i) Introduction to fundamental image processing concepts such as filtering, edge detection, and color planes (ii) Introduction to computer vision concepts such as corner detection, blob detection, and SIFT (iii) Introduction to camera and optics, homogeneous coordinate system, camera parameters, extrinsic and intrinsic parameters, stereo matching, and 3D vision, (iv) Machine learning and deep learning fundamentals, and learning-based computer vision algorithms, (v) Introduction to human action recognition, human pose estimation, and other advanced computer vision techniques.   

    Course Offerings: This is a core course for  the first semester MSc and PhD Computer Vision students at MBZUAI. From Fall 2023, this course is offered only for the MSc students. I have offered this course in the following semesters at MBZUAI: 

    • * Spring 2021 (for MSc and PhD students) 
    • * Fall 2021 (for MSc and PhD students) 
    • * Fall 2022 (for MSc and PhD students

    2. Visual Recognition and Detection (CV703) 

    Course Contents: The course covers the following topics: fundamentals of visual recognition, CNN-based visual recognition architectures, transformer-based visual recognition architectures, single-stage object detection, two-stage object detection, end-to-end object detection using transformers, remote sensing object detection, segmentation architectures, two-stage instance segmentation, single-stage instance segmentation, and video tracking. 

    Course Offerings: This is a core course for second-semester MSc and PhD Computer Vision students at MBZUAI. Starting from Spring 2024, this course is offered only to MSc students. I have offered this course in the following semesters at MBZUAI: 

    • *Fall 2021 ( Core course for MSc and PhD students) 
    • *Spring 2022 (Core course  for MSc and PhD students) 
    • *Spring 2023 (Core course  for MSc and PhD students) 
    • *Spring 2024 (Core course  for MSc students) 

    3. Advanced Computer Vision (CV801) 

    Course Contents: The course has two parts. The first part aims to build the essential computer vision background for first-semester PhD students, while the second part introduces advanced computer vision concepts. The first part covers an introduction to convolutional neural networks, vision transformers, and object detection and segmentation using convolutional neural networks and transformers. The second part introduces foundation models, vision-language models, fundamentals of large language models and large multimodal models, the segment anything model, efficient computer vision, remote sensing change detection, and image generation and diffusion models. 

    Course Offerings: This is a core course first semester PhD Computer Vision students at MBZUAI. I have offered this course in the following semesters. 

    • *Fall 2023 (Core course for PhD students) 
    • *Fall 2024 (Core course  for PhD students)-to be offered 

    Open Position

    OPEN POSITION

    I am looking for exceptional candidates for various positions at MBZUAI: PostDocs, Research Engineers, PhD students (for fall 2025 intake), and Research Interns. PostDoc applicants should have strong experience in Multimodal Models and LLM/VLMs. Research engineers must demonstrate strong development skills through past projects (a background in LLM/VLM and Generative AI for healthcare is a plus). Research Interns and PhD candidates should possess a strong academic background in BS/MS programs, focusing on computer vision/machine learning, with relevant coursework and projects. Previous publications in top conferences such as CVPR, ECCV, ICCV, NeurIPS, ICLR, or ICML are desirable. If you are interested, please send your CV and GitHub profile link to me