Research Projects
Active research directions and past work.
I am now interested in spatial understanding for VLMs — improving the spatial representation for VLMs and efficient VLMs, building on our prior work on context-aware object recognition and closed-form adaptation (Koo-Fu CLIP).
The aim is complete validation set reannotation — an ongoing, soon to be published project. Building on our analysis of ImageNet flaws and VLM-based recognition methods.
Investigating reinforcement learning for learned memory control in the Segment Anything Model 2 (SAM2), with the goal of improving long-form video object segmentation by dynamically managing the memory bank.
Creating large-scale multi-modal datasets and challenges for fine-grained visual categorization of fungi and other species. Co-organized the FungiCLEF competition, the FGVC @ CVPR 2025 workshop edition, and contributed to the broader LifeCLEF benchmarking ecosystem.
Methods for adapting segmentation models at test time using only a single image, without access to training data or labels. Work done in collaboration with Chaim Baskin and Alex Bronstein (Technion).
Leveraging model explainability and foundation models to bootstrap labeling for visual inspection of bridges and buildings. Research internship at IBM Research Zurich.
Exploring real-world and synthetic data sources for training robust scene text detection and recognition models. Research visit to the Computer Vision Centre (CVC) at UAB, Barcelona.