S2R2: Segment to Recognize Robustly – Enhancing Recognition by Image Decomposition

Visual Recognition Group, Czech Technical University in Prague,

We are working on polishing the code to make it available as soon as possible.

Abstract

In image recognition, both foreground (FG) and background (BG) play an important role; however, standard deep image recognition often leads to unintended over-reliance on the \bg{}, limiting model robustness in real-world deployment settings. Current solutions mainly suppress theBG, sacrificing BG information for improved generalization.

We propose "Segment to Recognize Robustly" (S2R2), a novel recognition approach which decouples the FG and BG modelling and combines them in a simple, robust and interpretable manner. S2R2 leverages recent advances in zero-shot segmentation to isolate the FG and the BG before or during recognition. By combining FG and BG, potentially also with standard full image classifier,S2R2 achieves state-of-the-art results on in-domain data while maintaining robustness to BG shifts.