interpretable language-independent deep learning architecture. We specifically target
discontinuity, an under-explored aspect that poses a significant challenge to computational
treatment of MWEs. Two neural architectures are explored: Graph Convolutional Network
(GCN) and multi-head self-attention. GCN leverages dependency parse information, and
self-attention attends to long-range relations. We finally propose a combined model that …