that recognize and represent the inherently perspectivized nature of multimodal
communication. To support our claim, we present a set of annotation experiments in which
FrameNet annotation is applied to the Multi30k and the Flickr 30k Entities datasets. We
assess the cosine similarity between the semantic representations derived from the
annotation of both pictures and captions for frames. Our findings indicate that:(i) frame …