Automatic prediction of emotions induced by movies

Y Baveye - 2015 - theses.hal.science
2015theses.hal.science
Never before have movies been as easily accessible to viewers, who can enjoy anywhere
the almost unlimited potential of movies for inducing emotions. Thus, knowing in advance
the emotions that a movie is likely to elicit to its viewers could help to improve the accuracy
of content delivery, video indexing or even summarization. However, transferring this
expertise to computers is a complex task due in part to the subjective nature of emotions.
The present thesis work is dedicated to the automatic prediction of emotions induced by …
Never before have movies been as easily accessible to viewers, who can enjoy anywhere the almost unlimited potential of movies for inducing emotions. Thus, knowing in advance the emotions that a movie is likely to elicit to its viewers could help to improve the accuracy of content delivery, video indexing or even summarization. However, transferring this expertise to computers is a complex task due in part to the subjective nature of emotions. The present thesis work is dedicated to the automatic prediction of emotions induced by movies based on the intrinsic properties of the audiovisual signal. To computationally deal with this problem, a video dataset annotated along the emotions induced to viewers is needed. However, existing datasets are not public due to copyright issues or are of a very limited size and content diversity. To answer to this specific need, this thesis addresses the development of the LIRIS-ACCEDE dataset. The advantages of this dataset are threefold: (1) it is based on movies under Creative Commons licenses and thus can be shared without infringing copyright, (2) it is composed of 9,800 good quality video excerpts with a large content diversity extracted from 160 feature films and short films, and (3) the 9,800 excerpts have been ranked through a pair-wise video comparison protocol along the induced valence and arousal axes using crowdsourcing. The high inter-annotator agreement reflects that annotations are fully consistent, despite the large diversity of raters’ cultural backgrounds. Three other experiments are also introduced in this thesis. First, affective ratings were collected for a subset of the LIRIS-ACCEDE dataset in order to cross-validate the crowdsourced annotations. The affective ratings made also possible the learning of Gaussian Processes for Regression, modeling the noisiness from measurements, to map the whole ranked LIRIS-ACCEDE dataset into the 2D valence-arousal affective space. Second, continuous ratings for 30 movies were collected in order develop temporally relevant computational models. Finally, a last experiment was performed in order to collect continuous physiological measurements for the 30 movies used in the second experiment. The correlation between both modalities strengthens the validity of the results of the experiments. Armed with a dataset, this thesis presents a computational model to infer the emotions induced by movies. The framework builds on the recent advances in deep learning and takes into account the relationship between consecutive scenes. It is composed of two fine-tuned Convolutional Neural Networks. One is dedicated to the visual modality and uses as input crops of key frames extracted from video segments, while the second one is dedicated to the audio modality through the use of audio spectrograms. The activations of the last fully connected layer of both networks are conv catenated to feed a Long Short-Term Memory Recurrent Neural Network to learn the dependencies between the consecutive video segments. The performance obtained by the model is compared to the performance of a baseline similar to previous work and shows very promising results but reflects the complexity of such tasks. Indeed, the automatic prediction of emotions induced by movies is still a very challenging task which is far from being solved.
theses.hal.science
以上显示的是最相近的搜索结果。 查看全部搜索结果