Active
Vision-Language Models
Spatial Understanding for Vision-Language Models

I am now interested in spatial understanding for VLMs — improving the spatial representation for VLMs and efficient VLMs, building on our prior work on context-aware object recognition and closed-form adaptation (Koo-Fu CLIP).

Recognition & Benchmarks
Aiming for Perfect ImageNet-1k

The aim is complete validation set reannotation — an ongoing, soon to be published project. Building on our analysis of ImageNet flaws and VLM-based recognition methods.

Video Understanding
RL for Video Object Segmentation

Investigating reinforcement learning for learned memory control in the Segment Anything Model 2 (SAM2), with the goal of improving long-form video object segmentation by dynamically managing the memory bank.

Past
2023 – 2025
Fine-grained Species Classification & Biodiversity Benchmarks

Creating large-scale multi-modal datasets and challenges for fine-grained visual categorization of fungi and other species. Co-organized the FungiCLEF competition, the FGVC @ CVPR 2025 workshop edition, and contributed to the broader LifeCLEF benchmarking ecosystem.

2022 – 2024
Test-Time Adaptation for Segmentation

Methods for adapting segmentation models at test time using only a single image, without access to training data or labels. Work done in collaboration with Chaim Baskin and Alex Bronstein (Technion).

2021 – 2022
AI-Assisted Labeling for Civil Infrastructure Inspection

Leveraging model explainability and foundation models to bootstrap labeling for visual inspection of bridges and buildings. Research internship at IBM Research Zurich.

2019 – 2021
Scene Text Detection & Recognition

Exploring real-world and synthetic data sources for training robust scene text detection and recognition models. Research visit to the Computer Vision Centre (CVC) at UAB, Barcelona.