M Hahn, A Silva, JM Rehg - arXiv e-prints, 2019 - ui.adsabs.harvard.edu
We describe a novel cross-modal embedding space for actions, named Action2Vec, which combines linguistic cues from class labels with spatio-temporal features derived from video …