The recent increase in interest for online multimedia streaming platforms has availed massive amounts of multimedia information that need to be indexed to be searchable and retrievable. User-centric implicit affective indexing employing emotion detection based on psycho-physiological signals, such as electrocardiography (ECG), galvanic skin response (GSR), electroencephalography (EEG) and face tracking, has recently gained attention. However, real world psycho-physiological signals obtained from wearable devices and facial trackers are contaminated by various noise sources that can result in spurious emotion detection. Therefore, in this paper we propose the development of psycho-physiological signal quality estimators for unimodal affect recognition systems. The presented systems perform adequately in classifying users affect however, they resulted in high failure rates due to rejection of bad quality samples. Thus, to reduce the affect recognition failure rate, a quality adaptive multimodal fusion scheme is proposed. The proposed scheme yields no failure, while at the same time classify the users' arousal/valence and liking with significantly above chance weighted F1-scores in a cross-user experiment. Another finding of this study is that head movements encode liking perception of users in response to music snippets. This work also includes the release of the employed dataset including psycho-physiological signals, their quality annotations, and users' affective self-assessments.