Enhanced video analytics for sentiment analysis based on fusing textual, auditory and visual information

S Al-Azani, ESM El-Alfy - IEEE Access, 2020 - ieeexplore.ieee.org
IEEE Access, 2020ieeexplore.ieee.org
With the widespread of online videos and digital transformation, video informatics and
analytics have recently gained substantially increasing importance with an impressive
success in a variety of tasks such as digital marketing, video surveillance and security
systems, healthcare systems, talk show analysis, analysis of influencing groups in social
media, and target tracking. This paper evaluates the potential contribution of various video
modalities and how they are correlated to video analytics for sentiment analysis in the …
With the widespread of online videos and digital transformation, video informatics and analytics have recently gained substantially increasing importance with an impressive success in a variety of tasks such as digital marketing, video surveillance and security systems, healthcare systems, talk show analysis, analysis of influencing groups in social media, and target tracking. This paper evaluates the potential contribution of various video modalities and how they are correlated to video analytics for sentiment analysis in the morphologically-rich Arabic language. Moreover, an enhanced approach is presented for video analytics to predict the speaker's sentiment of multi-dialect Arabic through the integration of textual, auditory and visual modalities. Different features are extracted to represent each modality including prosodic and spectral acoustic features to represent audio, neural word embedding to represent audio text transcript, and dense optical-flow descriptors to represent visual modality. The extracted features are used individually to train two machine learning classifiers to provide a baseline. Then, the effectiveness of various combinations of modalities is verified using multi-level fusion (feature, score and decision). The experimental results demonstrate that the proposed approach of combining different modalities can lead to more accurate prediction of speaker's sentiment with above 94% accuracy.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果