Research

My primary research interests are in the fields of Machine Learning, Computer Vision and Natural Language Processing.

From 2017 to 2020, I did a PhD thesis focused on joint visual and textual learning, specifically on Visual-Semantic Embeddings (VSE).
I was advised by Matthieu Cord, Patrick Pérez and Louis Chevallier.
During this time I worked on multimodal representation learning applied to multiple tasks: cross-modal retrieval, phrase grounding, ranking optimization, known instance search and iterative search.

I am excited about the upcoming research challenges, with a keen interest in exploring self-supervised and reasoning methods to bring multimodal interaction to the next level.

Martin Engilberge

Research

Projects

CLIP the Gap: A Single Domain Generalization Approach for Object Detection

Learning Transformations To Reduce the Geometric Shift in Object Detection

Multi-view Tracking Using Weakly Supervised Human Motion Prediction

Two-level Data Augmentation for Calibrated Multi-view Detection

Deep Inside Visual-Semantic Embeddings

VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability

SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates

Finding Beans in Burgers: Deep Semantic-Visual Embedding with Localization

Color Representation in Deep Neural Networks

Talks

Deep inside Visual-Semantic Embeddings

SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates

Visual-Semantic Embeddings and their application to localization

Deep semantic-visual embedding with localization