作者
Anish Kumar Vishwakarma, Kishor M Bhurchandi
发表日期
2022/11/4
图书
International Conference on Computer Vision and Image Processing
页码范围
638-645
出版商
Springer Nature Switzerland
简介
This work proposes a reliable and efficient end-to-end No-Reference Video Quality Assessment (NR-VQA) model that fuses deep spatial and temporal features. Since both spatial (semantic) and temporal (motion) features have a significant impact on video quality, we have developed an effective and fast predictor of video quality by combining both. ResNet-50, a well-known pre-trained image classification model, is employed to extract semantic features from video frames, whereas I3D, a well-known pre-trained action recognition model, is used to compute spatiotemporal features from short video clips. Further, extracted features are passed through a regressor head that consists of a Gated Recurrent Unit (GRU) followed by a Fully Connected (FC) layer. Four popular and widely used authentic distortion databases LIVE-VQC, KoNViD-1k, LIVE-Qualcomm, and CVD2014, are utilized for validating the performance …
学术搜索中的文章
AK Vishwakarma, KM Bhurchandi - International Conference on Computer Vision and …, 2022