CLIP the Gap: A Single Domain Generalization Approach for Object Detection

Published in CVPR, 2023

Abstract

Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss. Our experiments evidence the benefits of our approach, outperforming by 10\% the only existing SDG object detection method, Single-DGOD, on their own diverse weather-driving benchmark.

Paper Suppl. Code Poster Video

Citation

If you found this work useful, please cite the associated paper:

Vidit, M. Engilberge, and M. Salzmann, “CLIP the Gap: A Single Domain Generalization Approach for Object Detection” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

BibTex:

@inproceedings{vidit2023clip,
  title={CLIP the Gap: A Single Domain Generalization Approach for Object Detection},
  author={Vidit, Vidit and Engilberge, Martin and Salzmann, Mathieu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}