Self-supervised representation learning and applications to image and video analysis

Published in Theses. fr, 2023

Recommended citation: "Self-supervised representation learning and applications to image and video analysis| Theses. fr. J Denize, Normandie, 2023

Abstract

In this thesis, we develop approaches to perform self-supervised learning for image and video analysis. Self-supervised representation learning allows to pretrain neural networks to learn general concepts without labels before specializing in downstream tasks faster and with few annotations. We present three contributions to self-supervised image and video representation learning. First, we introduce the theoretical paradigm of soft contrastive learning and its practical implementation called Similarity Contrastive Estimation (SCE) connecting contrastive and relational learning for image representation. Second, SCE is extended to global temporal video representation learning. Lastly, we propose COMEDIAN a pipeline for local-temporal video representation learning for transformers. These contributions achieved state-of-the-art results on multiple benchmarks and led to several academic and technical published contributions.

Resources

Paper Link - Code Link

Citation

If you found our work useful, please consider citing us:

@PHDTHESIS{deniz2023,
  url = "http://www.theses.fr/2023NORMIR37",
  title = "Self-supervised representation learning and applications to image and video analysis",
  author = "Denize, Julien",
  year = "2023",
  note = "Thèse de doctorat dirigée par Hérault, Romain Informatique Normandie 2023",
  note = "2023NORMIR37",
  url = "http://www.theses.fr/2023NORMIR37/document",
}