Deep Inside Visual-Semantic Embeddings

Published in PhD Thesis, 2020


In this thesis, we aim to further advance image representation and understanding. Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: First, we present relevant background in Chapter 2, covering images and textual representation and existing multimodal approaches. Then in Chapter 3 we propose novel architectures further improving retrieval capability of VSE. In Chapter 4 we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, in Chapter 5 we delve into the learning process and in particular the loss function.



If you found this work useful, please cite the associated paper:

M. Engilberge, "Deep Inside Visual-Semantic Embeddings," 2020


  title = {Deep Inside Visual-Semantic Embeddings},
  author = {Engilberge, Martin},
  year = {2020}