Visual Recognition Group · CTU Prague

Aiming for Perfect ImageNet-1K

ImageNet-1K has long served as the gold standard for evaluating visual recognition models. As models improve, subtle yet impactful issues in its labels, class definitions, and structure become increasingly consequential. This page gathers our ongoing research efforts to identify, analyze, and correct the dataset — from a detailed audit of known flaws, to benchmarking state-of-the-art models on corrected variants, to a full-scale reannotation.

Visual Recognition Group, Czech Technical University in Prague  ·  Jiri Matas' group
Main Contributors Nikita Kisel, Illia Volkov, Klara Janouskova, Jiri Matas
Annotators Tetiana, Evita, Armands, Filip, Larisa, Richard, Kaci
Wildlife Expert Katka
Projects
lynx
"lynx, …" ✗
timber wolf
"timber wolf" ✓
dingo
"dingo, …" ✗
white wolf
"white wolf" ✗
red wolf
"red wolf, …" ✗
coyote
"coyote, …" ✗
Blog Post · ICLR 2025
Flaws of ImageNet, Computer Vision's Favorite Dataset
Nikita Kisel, Illia Volkov, Kateřina Hanzelková, Klara Janouskova*, Jiri Matas
ICLR 2025 Blogpost Track

A comprehensive analysis of ImageNet-1K's known issues: incorrect labels, overlapping and ambiguous class definitions, training–validation domain shifts, duplicate images, and problems with ILSVRC class selection. A starting point for the broader reannotation effort.

Precision-based fusion Precision-based fusion
Paper · BMVC 2025
Image Recognition with Vision and Language Embeddings of VLMs
Illia Volkov, Nikita Kisel, Klara Janouskova, Jiri Matas
British Machine Vision Conference (BMVC) 2025

We evaluate dual-encoder vision–language models for image classification, examining text-guided and vision-only approaches. We analyse key performance factors — prompt design, class diversity, k-NN count, reference set size — and introduce a learning-free fusion method based on per-class precision, tested on ImageNet-1K and corrected variants.

MLLM classification tasks
Paper · arXiv 2025
Multimodal Large Language Models as Image Classifiers
Nikita Kisel, Illia Volkov, Kateřina Hanzelková, Klara Janouskova, Jiri Matas
arXiv preprint, 2025

We benchmark multimodal large language models as image classifiers on ImageNet-1K and a partially reannotated variant, revealing how annotation noise affects model rankings and benchmark reliability — and the advantage MLLMs hold over discriminative models in handling label ambiguity.

Dataset · Work in Progress
ImageNet-1K Complete Reannotation
Visual Recognition Group, CTU Prague
In preparation

A comprehensive reannotation of the full ImageNet-1K validation set, addressing label errors, ambiguous class definitions, and multi-label instances identified through our prior analysis. Aiming to provide a cleaner, more reliable evaluation benchmark for the community.

Publications
Illia Volkov, Nikita Kisel, Klara Janouskova, Jiri Matas
British Machine Vision Conference (BMVC), 2025
@inproceedings{volkov2025image,
  title     = {Image Recognition with Vision and Language
               Embeddings of {VLMs}},
  author    = {Volkov, Illia and Kisel, Nikita and
               Janouskova, Klara and Matas, Jiri},
  booktitle = {British Machine Vision Conference ({BMVC})},
  year      = {2025},
  url       = {https://arxiv.org/abs/2509.09311}
}
Flaws of ImageNet, Computer Vision's Favorite Dataset
Nikita Kisel, Illia Volkov, Kateřina Hanzelková, Klara Janouskova*, Jiri Matas
ICLR 2025 Blogpost Track, 2025
@article{kisel2025flaws,
  title   = {Flaws of {ImageNet}, Computer Vision's
             Favorite Dataset},
  author  = {Kisel, Nikita and Volkov, Illia and
             Hanzelkov{\'a}, Kate{\v{r}}ina and
             Janouskova, Klara and Matas, Jiri},
  journal = {{ICLR} 2025 Blogpost Track},
  year    = {2025},
  url     = {https://klarajanouskova.github.io/ImageNet/blog/}
}
Nikita Kisel, Illia Volkov, Kateřina Hanzelková, Klara Janouskova, Jiri Matas
arXiv preprint, 2025
@article{kisel2025multimodal,
  title   = {Multimodal Large Language Models as Image Classifiers},
  author  = {Kisel, Nikita and Volkov, Illia and
             Hanzelkov{\'a}, Kate{\v{r}}ina and
             Janouskova, Klara and Matas, Jiri},
  journal = {arXiv preprint},
  year    = {2025},
  url     = {https://arxiv.org/abs/2603.06578}
}