Attention in Machine Learning allows a model to selectively up-weight informative parts of an input in relation to others. The Vision Transformer (ViT) is entirely based on attention. ViTs have shown state of the art performance in multiple fields including person re-identification, presentation attack detection and object recognition. Several works have shown that embedding human attention into a Machine Learning pipeline can improve performance or compensate for a lack of data. However the correlation between computer vision models and human attention has not yet been investigated. In this paper we explore the intersection of human and Transformer attention. For this we collect a new dataset: the University of Sassari Face Fixation Dataset (Uniss-FFD) of human fixations and show through a quantitative analysis that correlations exist between these two modalities. The dataset described in this paper is available at https://github.com/ CVLab-Uniss/Uniss-FFD.